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Abstract 

The  role  of  citizens  in  mapping  has  evolved  considerably  over  the  last  decade. 
This  chapter  outlines  the  background  to  citizen  sensing  in  mapping  and  sets  the 
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scene  for  the  chapters  that  follow,  which  highlight  some  of  the  main  outcomes  of 
a  collaborative  programme  of  work  to  enhance  the  role  of  citizens  in  mapping. 
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1  Introduction 

Accurate  and  timely  maps  are  a  fundamental  resource  for  a  vast  array  of  applica¬ 
tions.  Maps  are,  for  example,  central  to  everyday  activities  ranging  from  route 
planning  and  the  legal  demarcation  of  space  through  to  scientific  undertakings 
such  as  the  design  of  nature  reserves  for  species  conservation  or  the  monitoring 
of  terrestrial  carbon  pools  in  support  of  climate  change  policies.  Maps,  therefore, 
provide  a  range  of  services,  including  ones  that  support  economic  activity  (e.g. 
location-based  services)  and  enhance  human  health  and  well-being  (e.g.  dam¬ 
age  maps  for  disaster  relief  and  humanitarian  aid  programmes).  Maps  under¬ 
pin  popular  location-based  augmented  reality  mobile  games  such  as  Pokemon 
Go,  and  gaming  activity  can  be  used  to  help  acquire  geographic  information  for 
mapping  (Antoniou  and  Schlieder,  2014).  Map  production  and  updating  in  a 
rapidly  changing  world  is,  however,  a  major  scientific  and  practical  challenge. 
The  US  National  Academies,  for  example,  highlight  a  key  strategic  question  for 
the  geographical  sciences,  which  is:  how  can  we  better  observe,  analyse  and  vis¬ 
ualise  a  changing  world?  (CSDGSND,  2010).  This  book  is  focused  on  the  poten¬ 
tial  of  citizen  sensors,  typically  volunteers,  to  help  in  mapping  activities.  In  the 
context  of  this  book,  we  use  the  term  mapping  to  refer  to  the  process  of  creating 
maps.  This  term  aims  to  be  inclusive  and  thus  covers  any  activity  from  the  pro¬ 
cess  of  data  gathering  to  the  production  of  spatial  and  cartographic  products. 

Citizens  have  considerable  potential  as  a  source  of  geographic  informa¬ 
tion  and  this  activity  is  itself  a  further  strategic  priority  identified  by  the  US 
National  Academies  (CSDGSND,  2010).  Citizens  have  been  collecting  georef- 
erenced  data  of  several  types  for  some  time  (Boyd  and  Foody,  2014)  but  this 
activity,  and  its  possible  usefulness,  is  not  well  understood  and  therefore  its 
potential  remains  unfulfilled.  To  help  advance  the  role  of  citizens  in  mapping, 
a  Cooperation  in  Science  and  Technology  (COST)  Action  -  where  COST  is  a 
European  framework  to  support  research  on  topics  of  global  relevance  -  called 
TD1202  Mapping  and  the  Citizen  Sensor1  was  launched.  This  book  presents 
some  of  the  work  that  has  arisen  from  the  Actions  activities. 

Mapping  has  a  long  history,  and  ‘best  practices’  for  authoritative  mapping 
have  been  established  and  used  for  many  years.  For  example,  standards  for 
topographic  mapping  have  been  defined  and  used  by  major  government  agen¬ 
cies  (Olteanu-Raimond  et  al.,  2017).  Similarly,  in  relation  to  thematic  mapping 
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from  remote  sensing,  best  practices  for  map  validation  have  been  defined 
(Strahler  et  al.,  2006;  Olofsson  et  al.,  2014).  The  various  bodies  engaged  in 
authoritative  mapping,  however,  often  cannot  meet  mapping  requirements  or 
‘best  practices’,  which  can  be  impractical  to  implement  (Rahmatizadeh  et  al., 
2016)  -  for  example,  data  collection  that  follows  a  strict  probabilistic  sample 
design  or  the  need  for  large  sample  sizes  for  thematic  map  validation.  In  this 
situation  there  are  a  variety  of  ways  in  which  mapping  activity  could  progress. 
The  problems  of  authoritative  mapping  could  simply  be  recognised  and  stand¬ 
ards  lowered.  This  rather  negative  approach  would  appear  to  be  a  retrograde 
step.  It  would,  for  example,  leave  thematic  maps  unvalidated,  representing  no 
more  than  one  possible  representation,  one  untested  hypothesis,  of  contestable 
value  (Strahler  et  al.,  2006;  McRoberts,  2011).  Alternatively,  and  more  con¬ 
structively,  techniques  that  require  only  relatively  limited  amounts  of  reference 
data  could  be  used.  For  example,  semi-supervised  techniques  that  can  make 
use  of  unlabelled  information  could  be  used  in  the  production  of  thematic 
maps  from  remote  sensing  (Bruzzone  et  al.,  2006)  and  model-based  rather 
than  standard  design-based  inference  could  be  adopted  in  map  evaluation 
(McRoberts,  2010;  Foody,  2012).  A  further  alternative  is  to  utilise  the  enor¬ 
mous  potential  of  citizen  sensors.  For  example,  data  from  citizen  observations 
have  already  been  used  as  a  cost  effective  alternative  to  collect  reference  data 
for  hybrid  map  generation  (Schepaschenko  et  al.,  2015;  See  et  al.,  2015). 

The  role  of  citizens  has  been  noted  in  a  variety  of  subjects,  from  astronomy 
to  zoology  (Raddick  and  Szalay,  2010;  Dickinson  et  al.,  2010;  Wiersma,  2010; 
Muller  et  al.,  2015;  Rossiter  et  al,  2015).  Citizens  have  also  already  contributed 
greatly  to  mapping  activities,  including,  for  example,  to  major  programmes  such 
as  bird  species  distribution  mapping  (Dickinson  et  al,  2010;  Wiersma,  2010) 
and  to  the  pioneering  production  of  national  land  cover  datasets  such  as  the  first 
land  utilisation  survey  of  the  UK  in  the  1930s  (Parece  and  Campbell,  2015).  The 
role  of  citizens  in  mapping  has,  however,  benefited  greatly  from  recent  advances 
in  geoinformation  technologies.  Technological  advancement  has  fostered  the 
emerging  role  of  the  citizen  as  a  source  of  data.  Due  to  the  proliferation  of  loca¬ 
tion  aware  devices  and  the  opportunities  of  Web  2.0,  it  is  now  possible  for  cit¬ 
izens  to  easily  acquire,  share  and  use  geographical  information.  This  activity 
has  been  named  or  described  in  a  variety  of  ways,  notably  as  crowdsourcing, 
volunteered  geographic  information  (VGI),  user  generated  spatial  content,  neo¬ 
geographies  and  the  pervasive  media  (See  et  al,  2016).  These  various  terms  are 
often  used  to  help  differentiate  between  activity  that  is  passive  or  active,  and 
between  information  that  is  truly  volunteered  or  that  is  being  provided  for  a 
modest,  and  possibly  non-financial,  reward.  In  this  book,  there  is  no  particular 
desire  to  distinguish  between  the  different  approaches,  although  the  detail  can 
sometimes  be  important,  and  the  focus  is  simply  on  citizen-derived  geographi¬ 
cal  data.  The  citizens  contributing  data  may  be  anyone:  they  could  be  children 
or  adults,  they  may  be  amateurs  or  experts,  they  may  have  differing  motivations 
and  may  even  be  contributing  without  knowing  so. 
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Citizen  sensing  has  dramatically  affected  mapping  and  map  use,  impacting  on 
routine  daily  life  activities  such  as  gaming  and  tourism  as  well  as  on  science  and 
technology  more  generally.  Resources  such  as  Google  Earth,  Bing  Maps  and  even 
maps  that  are  citizen-generated  through  projects  such  as  OpenStreetMap  (OSM) 
are  now  widely  and  routinely  used  by  diverse  amateur  and  professional  communi¬ 
ties.  Furthermore,  possibly  radical  impacts  on  mapping  activity  are  likely  to  occur 
(Olteanu-Raimond  et  al.,  2017)  and  some  argue  that  a  new  data-rich  paradigm  is 
emerging  with  VGI  (Jiang  and  Thill,  2015;  Li  et  al.,  2016).  These  future  develop¬ 
ments  should  arise  from  the  trend  for  continued  technological  advances  but  also 
from  an  increased  provision  of  free,  or  at  least  inexpensive,  remote  sensing  data 
and  increasing  access  to  official  government  data  resources.  These  tremendous 
opportunities  do,  of  course,  come  with  challenges.  In  the  big  data  era,  there  is 
now,  paradoxically,  so  much  data  that  problems  in  mapping  may  arise.  The  curse 
of  data  volume  can  be  likened  to  the  widely  encountered  Hughes  phenomenon, 
in  which  map  accuracy  declines  as  data  dimensionality  increases  for  a  fixed 
ground  dataset  (Richards,  2013).  Immense  volumes  of  data  from  future  remote 
sensing  will  amount  to  a  deluge;  for  example,  Sentinel  2  satellites  alone  will  pro¬ 
duce  1.6  TB  of  data  per  day,  and  yet  they  are  just  one  pair  of  the  over  350  Earth 
observing  satellites  that  are  to  be  launched  by  40  different  countries  by  2023 
(Foody  et  al.,  2015).  There  are  also  clear  challenges  with  citizen-derived  data. 
These  datasets  can  be  voluminous,  as  with  other  components  of  the  developing 
field  of  big  geospatial  data,  and  their  size  and  dynamic  nature  may  need  to  be 
recognised  explicitly  if  they  are  to  be  used  efficiently  and  effectively  (Herrera 
et  al.,  2015;  Li  et  al.,  2016).  Citizen-derived  data  are  also  often  of  varied  (and 
typically  unknown)  quality  and  trust  levels  (Goodchild  and  Glennon,  2010). 
Moreover,  the  data  generated  maybe  poorly  described  and  associated  with  little 
if  any  metadata.  To  realise  the  full  potential  of  citizen  sensing,  there  is  a  need  to 
establish  good  practices  and  perhaps  even  protocols  for  some  activities  (Schade 
and  Tsinaraki,  2016).  This  will  be  a  challenging  task,  not  least  due  to  issues  such 
as  the  diversity  of  datasets  generated,  the  range  of  devices  used  and  sensitivities 
to  error  and  uncertainty,  which  are  often  application-specific.  Additionally,  there 
are  a  suite  of  other  major  considerations  in  the  use  of  VGI,  including  ownership 
rights,  as  well  as  privacy,  legal  and  ethical  issues  (Granell  and  Ostermann,  2016). 
As  a  further  complication,  there  may  be  tensions  between  different  parts  of  the 
community,  with,  for  example,  some  calling  for  anonymity  and  privacy  as  an 
essential  feature  (Mozas-Calvache,  2016)  while  others  want  information  on  vol¬ 
unteers  to  be  available  to  aid  assessments  of  trust  (Zhao  et  al.,  2016).  There  is  also 
clearly  a  strong  desire  to  not  ‘kill  off  the  golden  goose’  by  laying  down  strict  rules 
and  procedures  that  end  up  making  volunteering  an  onerous  task  and  ultimately 
deter  the  provision  of  citizen- derived  data.  A  variety  of  priorities  have  been 
identified  that  must  be  addressed  in  order  to  facilitate  citizen  sensing,  including 
issues  such  as  standardisation  and  interoperability  (Brown  et  al,  2013),  and 
groups  are  working  on  defining  good  practices  to  encourage  mapping-related 
applications  (Pocock  et  al.,  2014a;  2014b).  This  book  reports  on  some  of  the 
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activities  of  one  group,  the  participants  of  COST  Action  TD1202.  This  Action 
has  addressed  a  wide  range  of  issues  connected  with  citizen  sensing  in  map¬ 
ping,  from  advice  on  photography  that  might  be  uploaded  to  social  media  sites 
(Antoniou  et  al.,  2016)  to  informing  the  activities  of  European  national  mapping 
agencies  (NMAs)  (Olteanu-Raimond  et  al.,  2017).  The  production  of  the  book 
involved  considerable  input  from  the  Action  and  beyond.  We  are  grateful  to  all 
who  helped  bring  this  book  to  fruition  from  authors  to  publishers  but  we  wish  to 
also  highlight  here  the  significant  inputs  from  Benedicte  Bucher  who  reviewed 
the  manuscript  for  publication  and  Nourane  Clostre  who  copyedited  it. 


2  Outline  of  the  Book 

This  book  is  intended  to  closely  reflect  the  main  research  themes  of  COST 
Action  TD1202.  One  of  the  first  themes  addressed  was  how  VGI  is  acquired, 
managed,  stored  and  disseminated.  Building  upon  a  review  that  systematically 
evaluated  VGI  websites  and  mobile  applications  to  characterise  VGI  (See  et  al., 
2016),  Chapter  2  provides  an  overview  of  different  sources  of  VGI  for  mapping. 
The  sources  are  first  distinguished  by  (i)  whether  the  VGI  can  be  considered 
as  framework  data  (i.e.  of  the  type  generally  collected  by  NMAs)  or  whether 
they  fall  into  ‘other’  types  of  data  (e.g.  weather  and  traffic  data)  and  (ii)  whether 
the  VGI  is  actively  or  passively  collected.  The  chapter  then  provides  a  range  of 
examples  that  illustrate  these  four  types  of  citizen-contributed  data,  as  well  as  a 
brief  discussion  on  3D  VGI.  Chapter  3  then  discusses  one  of  the  most  success¬ 
ful  VGI  projects,  which  is  OSM,  and  provides  a  comprehensive  introduction  to 
this  data  source,  including  how  it  is  being  used  in  a  range  of  services  and  appli¬ 
cations  in  education,  mapping,  visualisation  and  research.  The  current  status 
and  positioning  of  OSM  as  a  VGI  project  is  also  evaluated.  The  chapter  then 
closes  with  discussions  on  future  issues  that  need  to  be  considered  by  contribu¬ 
tors  to  and  users  of  OSM  in  order  for  it  to  continue  its  success  and  growth.  In 
Chapter  4,  the  emphasis  shifts  to  exploring  automated  mapmaking  with  the 
use  of  OSM  data.  The  chapter  starts  by  examining  why  traditional  automated 
mapping  processes  are  not  adapted  to  VGI  and  describes  attempts  to  solve  this 
problem.  The  focus  then  turns  towards  the  level  of  detail  of  OSM  features  and 
how  it  can  be  inferred  and  harmonised  for  different  features,  which  aims  to  aid 
map  generalisation.  How  other  VGI  sources,  such  as  geotagged  photographs, 
can  help  to  evaluate  the  quality  of  OSM  prior  to  the  application  of  any  automatic 
mapmaking  processes  is  also  presented.  Finally,  issues  related  to  advanced  map 
stylisation  with  VGI  are  discussed. 

Another  prominent  theme  of  the  Action  has  been  to  gain  a  better  under¬ 
standing  of  the  motivations  of  contributors  to  VGI,  and  this  theme  is  outlined 
in  Chapter  5.  This  chapter  reviews  the  literature  on  motivation  and  incentives 
for  participation  in  VGI  projects  and  then  presents  case  studies  to  reflect  on 
what  motivations  and  incentives  have  worked  well,  including  how  to  sustain 
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participation  in  VGI  activities  in  the  longer  term.  When  considering  citizens 
as  part  of  the  VGI  equation,  legal  issues  and  issues  such  as  data  privacy  and  the 
ethics  of  data  use  and  reuse  immediately  come  to  the  forefront.  These  are  dis¬ 
cussed  in  detail  in  Chapter  6  with  specific  reference  to  VGI  as  a  unique  source 
of  information. 

The  quality  of  citizen-sensor-derived  VGI  is  often  a  problem,  as  sources  range 
from  naive,  poorly  trained  citizens  to  authoritative  experts  and  may  even  include 
people  contributing  erroneous  data  maliciously.  Hence  another  major  theme 
of  the  Action  has  been  data  quality.  It  is  important  to  note  that  VGI  can  be  as 
good  as,  if  not  better  than,  authoritative  datasets  in  terms  of  quality  (Antoniou 
and  Skopeliti,  2015;  See  et  al.,  2013;  Dorn  et  al.,  2015).  However,  even  if  the  data 
collected  could  be  trusted  in  terms  of  features  such  as  their  accuracy,  there  are  a 
variety  of  other  concerns,  relating  to  issues  such  as  the  spatial  sampling  and  bias 
of  data  collection  (Brown,  2017)  and  the  ability  to  repeat  and  replicate  studies, 
that  may  limit  the  scientific  value  of  the  data  (Ostermann  and  Granell,  2017). 
Much  VGI  is  collected  opportunistically  and  is  spatially  biased,  for  instance 
by  digital  divides  between  urban  and  rural  regions  or  between  developed  and 
developing  countries  (Estima  et  al.,  2014;  Neis  and  Zielstra,  2014).  There  are 
also  social  divides,  with  most  contributions  made  by  young  citizens  who  are 
technologically  savvy  (Haworth  et  al,  2015).  Some  of  the  Actions  work  has 
focused  on  how  VGI  could  be  usefully  used  in  map  validation  (Fonte  et  al., 
2015),  taking  quality  considerations  into  account.  In  this  book,  Chapters  7  to  9 
all  deal  with  quality-related  issues  of  VGI.  Chapter  7  is  dedicated  to  the  assess¬ 
ment  of  VGI  quality,  and  presents  the  challenges  that  are  raised  by  this  type 
of  data  for  quality  assessment.  It  provides  an  overview  of  how  the  data  quality 
elements  included  in  the  ISO  19157  standard  can  be  applied  to  VGI  as  well  as 
of  the  limitations  of  these  elements.  A  description  of  additional  indicators  that 
can  be  used  to  assess  VGI  quality  is  then  made.  Efforts  developed  to  establish 
workflows  to  assess  VGI  data  quality  are  then  presented  and  discussed,  as  well 
as  efforts  to  combine  data  quality  indicators  to  assess  VGI  fitness-for-use. 

Returning  back  to  OSM,  Chapter  8  discusses  the  evolution  of  OSM  qual¬ 
ity  from  a  novel  point  of  view;  the  chapter  deviates  from  the  more  traditional 
quality  measurements  or  quality  statistics  used  in  most  OSM  quality  studies 
and  examines  the  evolution  of  OSM  data  quality  as  a  function  of  the  OSM 
micro-environment,  such  as  OSM  specifications  and  OSM  editors.  The  evolu¬ 
tion  of  OSM  specifications,  taking  into  account  a  number  of  different  factors 
that  directly  affect  the  quality  of  contributions,  is  examined.  The  evolution  of 
OSM  editors  is  also  presented,  as  they  are  literally  the  entry  point  for  all  OSM 
contributions.  Finally,  the  combined  impact  of  these  two  factors  on  the  overall 
OSM  quality  is  discussed.  In  Chapter  9,  a  framework  for  VGI  quality  visualisa¬ 
tion  is  presented  that  supports  both  the  communication  and  the  exploration 
of  VGI  quality.  This  framework  is  based  on  four  factors:  the  available  methods 
for  quality  visualisation  of  spatial  data;  the  nature  of  VGI  data  quality;  user 
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profiles;  and  the  visualisation  environment.  The  chapter  then  discusses  how  the 
framework  can  be  implemented  with  VGI  data. 

One  critical  issue  related  to  the  diversity  and  quality  of  spatial  data  is  the 
need  to  develop  good  practices.  Here,  there  is  a  tension  between  the  desire  to 
encourage  volunteers  without  constraining  their  involvement  and  the  desire 
to  acquire  useful  data.  The  latter  could  be  aided  by  the  specification  of  best 
practices  or  even  protocols,  but  if  these  become  too  onerous  they  may  actually 
act  to  deter  volunteers.  Since,  for  example,  much  current  VGI  is  derived  from 
geotagged  photographs  and  from  vector  data,  such  as  in  the  OSM  project,  the 
proposal  of  good  practices  for  key  mapping-related  activities  is  one  major  way 
in  which  the  Action  has  helped  contribute  to  the  development  of  the  subject. 
Thus,  Chapter  10  explores  the  role  of  protocols  as  tools  to  guide  data  collec¬ 
tion  in  VGI  projects  with  the  purpose  of  increasing  the  quality  of  user  contri¬ 
butions.  With  the  help  of  technology,  protocols  should  balance  the  opposing 
needs  of  providing  VGI  contributors  with  detailed  instructions  and  keeping 
intact  their  enthusiasm  and  motivation.  With  this  in  mind,  a  general  protocol  is 
formalised,  and  specific,  real-world  applications  of  the  protocol  are  presented. 
In  Chapter  11,  the  means  by  which  citizen-generated  data  may  be  published 
and  documented  to  make  these  datasets  discoverable  and  reusable  for  robust 
and  reproducible  science  is  investigated.  The  current  state  of  the  art  is  assessed, 
with  particular  attention  to  the  role  and  adoption  of  Data  Management  Plans 
for  citizen  science  initiatives  and  observatories.  The  relevance  and  availability 
of  existing  data  and  metadata  standards,  vocabularies  and  tools  which  can  be 
employed  to  support  interoperable  storage  and  dissemination  of  VGI  are  evalu¬ 
ated,  and  reference  is  made  to  examples  of  good  practice  from  existing  infra¬ 
structures.  Finally,  in  Chapter  12,  the  challenges  of  integrating  VGI  with  the 
Infrastructure  for  Spatial  Information  in  the  European  Community  (INSPIRE) 
directive  are  discussed,  contrasting  Spatial  Data  Infrastructures  (SDIs)  with 
VGI.  This  is  followed  by  a  discussion  of  the  set  of  critical  issues  that  arise  when 
integrating  INSPIRE  and  VGI  and  of  what  the  prospects  for  integration  are, 
providing  illustrative  examples.  Finally,  a  conceptual  framework  is  presented 
for  what  an  SDI-VGI  integrated  GIS  platform  could  look  like. 

A  final  theme  in  the  Action  has  been  the  role  of  citizen  sensing  in  map  pro¬ 
duction.  The  research  undertaken  was  aimed  at  defining  the  needs  of  the  map 
producing  community,  identifying  the  sensitivity  and  tolerance  of  mapping 
methods  to  different  types  of  error  and  uncertainty  in  VGI,  and  assessing  the 
potential  role  of  current  VGI  efforts  as  well  as  of  active  citizen  sensing  in  the 
activities  of  NMAs.  A  survey  of  key  map  producers,  notably  European  NMAs, 
was  undertaken  to  establish  their  current  and  potential  future  use  of  VGI  to 
inform  their  work  (Olteanu-Raimond  et  al.,  2017).  Chapter  13  builds  upon  this 
work  and  provides  an  overview  of  the  experiences  of  some  European  NMAs  in 
engaging  with  VGI.  It  also  provides  recommendations  to  support  wider  engage¬ 
ment  with  the  VGI  community  and  to  help  ensure  that  the  potential  of  VGI  in 
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mapping  is  fully  exploited  and  used  in  the  workflows  of  NMAs  in  the  future. 
Switching  to  another  public  stakeholder,  i.e.  urban  planners,  Chapter  14  dis¬ 
cusses  the  value  and  opportunities  of  VGI,  and  of  its  more  passive  equivalent, 
social  media  geographic  information  (SMGI),  for  urban  planning.  A  number 
of  examples  are  provided  to  illustrate  how  this  new  source  of  information 
can  be  used  to  improve  visualisation,  planning  processes,  evaluation  of  plans 
and  decision-making.  The  use  of  VGI  and  SMGI  in  smart  cities  initiatives  is 
also  examined.  One  recent  trend  has  been  towards  the  development  of  citizen 
observatories  and  hence  Chapter  15  discusses  their  increasing  role  in  engag¬ 
ing  citizens  in  science,  environmental  monitoring  and  policy-making.  The 
chapter  provides  an  overview  of  existing  and  planned  citizen  observatories 
and  of  where  further  developments  are  happening  at  the  European  front.  The 
chapter  closes  with  a  discussion  of  the  key  challenges  and  development  needs 
for  policy-  and  decision-makers  in  the  future. 

The  term  VGI  has  been  in  existence  for  only  a  decade,  yet  the  number  of 
new  applications  and  the  increased  involvement  of  citizens  in  mapping  and 
environmental  monitoring  has  literally  exploded.  The  final  chapter  of  the  book 
examines  what  the  future  trends  in  VGI  might  be  and  the  increasing  role  that 
smart  cities  and  society  will  play  in  this  innovative  area.  It  is  clear  that  the 
future  for  VGI  is  very  bright;  the  key  is  to  not  waste  these  valuable  citizen- 
based  resources  but  to  find  ways  to  maximise  the  synergies  between  stakehold¬ 
ers  across  multiple  levels  of  society. 


Notes 

1  http://www.citizensensor-cost.eu/ 
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Abstract 

The  concept  of  Volunteered  Geographic  Information  (VGI)  is  often  exem¬ 
plified  by  the  mapping  of  features  in  OpenStreetMap  (OSM),  yet  there  are 
many  other  sources  of  VGI  available.  Some  VGI  is  very  focused  on  the  crea¬ 
tion  of  map -based  products,  while  in  other  applications  location  is  simply 
one  attribute  that  is  routinely  collected,  due  to  the  proliferation  of  Global 
Positioning  System  (GPS)  enabled  devices,  e.g.  mobile  phones  and  tablets. 
This  chapter  aims  to  provide  an  overview  of  the  variety  of  sources  of  VGI 
currently  available,  categorised  according  to  whether  they  can  contribute  to 
framework  data  (i.e.  the  type  of  data  that  are  commonly  part  of  the  spa¬ 
tial  data  infrastructure  of  national  mapping  agencies  and  governments)  or 
not  and  whether  the  data  have  been  actively  or  passively  collected.  A  range 
of  examples  are  presented  to  illustrate  the  different  types  of  VGI  in  each  of 
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these  main  categories.  Finally,  the  chapter  discusses  some  of  the  main  issues 
surrounding  the  use  of  VGI  and  points  to  chapters  in  the  book  where  these 
issues  are  described  in  more  detail. 


Keywords 
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passive  data  collection,  crowdsourcing 


1  Introduction 

Crowdsourced  mapping  and  citizen-driven  spatial  data  collection  are  radically 
changing  the  relationship  between  traditional  map  production  and  those  indi¬ 
viduals  and  organisations  that  consume  the  data.  In  the  past,  authoritative  maps 
such  as  road  networks  and  building  footprints  were  firmly  in  the  domain  of 
national  mapping  agencies  (NMAs),  where  the  maps  were  created  by  profes¬ 
sionals.  Today  NMAs  still  fulfil  this  role  but  they  face  a  relatively  new,  citizen 
mapping  community,  armed  with  online  mapping  tools,  open  access  to  very- 
high-resolution  satellite  imagery/aerial  photography  and  mobile  devices  with 
GPS  (Global  Positioning  System)  for  geotagging  features.  The  result  has  been  an 
abundance  of  maps  that  are  created  by  citizens  and  a  blurring  of  the  traditional 
boundaries  between  map  producers  and  consumers,  as  citizens  take  on  the  dual 
role  of  production  and  consumption  (Coleman  et  al.,  2009;  See  et  al.,  2016b). 

At  the  same  time,  citizens  have  become  empowered  to  collect  and  map  fea¬ 
tures  and  objects  that  are  not  traditionally  mapped  by  NMAs,  such  as  senti¬ 
ments  and  hiking/biking  routes,  among  many  others.  OpenStreetMap  (OSM) 
is  one  of  the  most  successful  and  most  commonly  cited  examples  (e.g.  Fan 
et  al.,  2016;  Hagenauer  and  Helbich,  2012;  Haklay,  2010;  Jokar  Arsanjani  et  al., 
2015b;  Mooney  and  Corcoran,  2013)  of  this  new  phenomenon,  referred  to  in 
the  geographical  literature  as  Volunteered  Geographic  Information  (VGI),  a 
term  originally  coined  by  Goodchild  (2007).  Numerous  other  terms  have  been 
proposed  that  refer  to  similar  phenomena,  all  of  which  have  citizens  and  citizen 
participation  at  their  core.  In  the  field  of  geography  and  urban  planning,  public 
participation  in  Geographic  Information  Systems  (PPGIS)  appeared  in  the  late 
1990s,  as  a  way  of  improving  the  public  consultation  experience  and  fostering 
public  engagement  (Kingston  et  al.,  2000;  Sieber,  2006)  and  can  be  thought  of 
as  a  precursor  to  VGI,  when  Web  2.0  technologies  and  online  mapping  were 
still  in  their  infancy.  In  other  fields,  for  example  in  ecology,  conservation  and 
biodiversity  monitoring,  there  has  been  a  long  tradition  of  citizen  involve¬ 
ment  in  science,  such  as  the  Audubon  Society’s  Christmas  Bird  Count,  which 
started  in  the  1900s  (LeBaron,  2007).  In  these  domains,  citizen  involvement 
has  commonly  been  referred  to  as  public  participation  in  scientific  research 
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(PPSR)  (Bonney  et  al.,  2009a)  and  more  recently  as  citizen  science  (Bonney 
et  al.,  2009b),  where  data  collection,  often  geotagged,  is  only  one  component 
of  citizen  participation.  In  yet  another  domain,  i.e.  that  of  the  business  world, 
the  term  crowdsourcing  has  emerged  to  refer  to  the  outsourcing  of  tasks  to 
the  crowd  (Howe,  2006).  Crowdsourcing  can  be  used  for  financial  remunera¬ 
tion  (Buhrmester  et  al.,  2011)  or  for  other,  more  altruistic  reasons,  e.g.  search¬ 
ing  for  the  remains  of  the  Malaysian  Airways  plane  that  went  missing  in  2014 
(Whittaker  et  al.,  2015)  or  providing  hotel  and  restaurant  reviews  on  sites  like 
Trip  Advisor;  other  initiatives  can  be  found  in  Sester  et  al.  (2014). 

Many  other  terms  exist  and  the  reader  is  referred  to  a  recent  review  by  See 
et  al.  (2016b)  for  a  broader  overview.  For  the  purpose  of  this  book,  we  use  the 
term  VGI  to  mean  geotagged  data  contributed  by  citizens,  whether  map-based 
or  where  location  is  simply  an  attribute  in  a  much  larger  dataset.  The  term  cov¬ 
ers  many  different  domains  of  activities,  from  monitoring  the  weather  to  spe¬ 
cies  identification  and  georeferencing  old  historical  maps  contained  in  digital 
libraries.  This  chapter  aims  to  provide  an  overview  of  the  variety  of  sources  of 
VGI  currently  available,  categorised  according  to  whether  they  are  framework 
data  (i.e.  the  type  of  data  that  are  commonly  part  of  the  spatial  data  infrastruc¬ 
ture  of  national  mapping  agencies  and  governments)  or  not  and  whether  the 
data  have  been  actively  or  passively  collected,  as  outlined  in  Section  2  below. 
A  range  of  examples  is  then  presented  in  Section  3  to  illustrate  the  different 
types  of  VGI  in  each  of  these  main  categories.  Finally,  the  main  issues  that  cur¬ 
rently  surround  VGI  are  highlighted,  providing  a  link  to  different  chapters  in 
the  book  that  describe  these  issues  in  more  detail. 


2  Categorisation  of  VGI  Sources  for  Mapping 

To  help  organise  the  diverse  range  of  VGI  sources  available  for  mapping,  we 
have  categorised  them  based  on  two  main  criteria.  The  first  one  is  whether  the 
data  fall  into  the  territory  of  NMAs;  we  refer  here  to  such  data  as  ‘framework 
data.  Framework  data  are  typically  data  that  are  collected  by  government  agen¬ 
cies,  and  which  can  be  organised  into  the  following  themes:  geodetic  control, 
orthoimagery,  elevation,  transportation,  hydrography,  governmental  units  and 
cadastre,  and  comprise  the  basic  components  of  a  government’s  spatial  data 
infrastructure  (SDI;  Elwood  et  al.,  2012).  These  data  will  be  collected  by  profes¬ 
sionals  and  have  minimum  levels  of  error  specified  in  their  production,  with 
update  cycles  that  depend  on  national  budgets  but  will  generally  range  from 
one  to  five  years.  Depending  on  the  country,  the  content  of  these  datasets  may 
also  vary;  for  example,  some  countries  do  not  have  cadastres,  while  others  may 
include  a  gazetteer  as  part  of  their  SDI.  In  the  European  Union,  the  INSPIRE 
(Infrastructure  for  Spatial  Information  in  Europe)  Directive  specifies  the  types 
of  framework  data  that  all  EU  member  states  should  collect  (EC,  2007);  the  type 
of  data  specified  in  the  Directive’s  Annexes  I  and  II  corresponds  to  the  types  of 
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data  outlined  in  Elwood  et  al.  (2012),  but  Annex  II  additionally  includes  land 
cover  and  geology,  and  Annex  III  contains  much  more  detail  in  terms  of  land 
use  and  socio-economic  data.  For  the  purpose  of  this  chapter,  however,  we  take 
framework  data  to  mean  the  most  basic  components  of  an  SDI  as  outlined  by 
Elwood  et  al.  (2012). 

The  second  criterion  is  whether  the  data  have  been  contributed  actively  or 
passively  (Harvey,  2013).  Active  data  collection  includes  campaigns  that  call 
for  participation  or  where  people  sign  up  to  complete  micro-tasks  with  the  full 
knowledge  that  they  are  contributing  the  data  for  a  specific  purpose,  e.g.  the 
active  mapping  of  features  in  OSM.  In  passive  mode,  participants  maybe  pro¬ 
viding  geotagged  information  willingly,  e.g.  through  social  media,  but  the  data 
may  then  be  used  for  purposes,  such  as  for  behavioural  studies  or  marketing 
purposes,  that  contributors  are  unaware  of  since  they  did  not  read  the  terms  of 
participation  in  detail  or  modify  their  privacy  settings  (if  available).  Examples 
of  this  are  geotagged  tweets  from  Twitter,  geotagged  photographs  from  Flickr 
and  Instagram,  etc.  There  is  a  tradeoff  between  the  two  data  sources;  active  data 
are  often  easier  to  process  since  they  were  collected  with  a  specific  purpose  in 
mind  and  often  with  some  type  of  protocol  or  minimum  data  requirements, 
while  passive  data  may  not  meet  the  minimum  requirements  of  an  application. 
In  addition,  passive  data  can  be  ‘big  data  in  terms  of  volume  and  complexity, 
but  may  thus  also  require  considerable  post-processing  before  use.  Regardless 
of  how  the  data  are  collected,  the  importance  of  this  new  wave  of  data  collec¬ 
tion,  i.e.  VGI,  for  the  public  and  private  sectors  and  for  scientific  research  is  yet 
to  be  truly  exploited. 

Using  these  two  criteria  to  categorise  VGI,  i.e.  framework  vs.  non-framework 
data  and  active  vs.  passive  data  collection,  there  are  four  categories  in  which 
VGI  can  fall.  The  first  category  is  VGI  that  can  contribute  to  framework  data 
and  that  is  actively  contributed  by  volunteers.  In  this  category  fall  projects 
that  can  be  used  to  update  or  correct  the  types  of  data  routinely  collected  by 
NMAs;  the  category  is  represented  by  the  upper  right  quadrant  of  Figure  1. 
The  second  category  is  non-framework  data  (or  data  that  are  not  routinely  col¬ 
lected  by  NMAs  but  are  useful  for  other  agencies  and  scientific  research)  where 
active  participation  by  volunteers  is  evident;  it  is  located  in  the  bottom  right 
quadrant  of  Figure  1.  The  left  half  of  Figure  1  contains  the  other  two  catego¬ 
ries,  i.e.  framework  and  non-framework  data  that  are  passively  collected,  e.g. 
through  social  media  or  sensors  such  as  the  GPS  of  a  mobile  phone.  The  four 
quadrants  in  Figure  1  are  then  populated  with  different  sources  of  VGI;  exam¬ 
ples  of  these  sources  are  provided  in  Section  3.  Note  that  the  exact  location  of 
the  VGI  examples  within  each  quadrant  has  no  significance  -  they  are  simply 
arranged  for  optimal  readability.  A  fifth  category  has  been  added  to  consider 
three-dimensional  VGI;  although  this  type  of  VGI  could  also  be  characterised 
by  the  two  criteria  introduced  in  this  section,  we  provide  a  separate  discussion 
of  it,  focused  on  height  data,  OSM  and  publicly  available  sources  of  elevation, 
in  Section  3.5,  since  this  is  a  new  area  of  VGI. 


Sources  of  VGI  for  Mapping  1 7 


Framework  Data 


Transport  (road  networks  from  sat  navs, 
traffic  data  from  TomTom,  Google  traffic) 


Feature  mapping 
(addresses,  buildings,  elevation, 
points  of  interest,  protected 
areas,  rivers  and  canals,  road 
and  rail  networks) 


Hiking  and 
biking  trails 


Gazetteer 


Feature  mapping  by 
Google  via  their  game 
Ingress 


Cadastral  parcels  and 
other  land  administrative  data 


Land  cover  /  Land  use 


Passive  Data 


Active  Data 


Collection 


Google  search  data 


Transport  (live  feeds  of 

buses,  trains,  metro)  Mobile  data  /  behaviour 


(store  purchases, 
customer  survey  data, 


Biodiversity  (species 
identification, 


Location-based  social  media  mobile  Phone  data> 


geo-tagged  wildlife  images)  crime  /  PubNc  safety 


(Foursquare, 
Twitter,  Facebook,  etc.) 


Places  of  interest  /  travel 
(geo-tagged  photos,  videos, 
stories,  advice) 


Disaster  events  (natural  and 
manmade) 


Non-framework  Data 


Fig.  1:  Categorisation  of  VGI  based  on  whether  it  consists  of  framework  or 
non-framework  data  and  whether  the  data  have  been  actively  or  passively 
collected.  This  figure  is  modified  from  See  et  al.  (2016b). 


3  Examples  of  VGI  Sources  for  Mapping 


3. 1  Active  Framework  Data 


OSM,  as  already  mentioned,  is  one  of  the  most  successful  and  commonly  cited 
examples  of  VGI  sources,  and  aims  at  creating  a  world  map  freely  available  to  any¬ 
one  (Jokar  Arsanjani  et  al.,  2015b).  OSM  is  a  prime  example  of  feature  mapping 
and  covers  data  types  often  found  in  topographic  databases  and  transportation 
networks;  an  extensive  overview  of  this  initiative  is  provided  in  Chapter  3  of  this 
book  (Mooney  and  Minghini,  2017).  Google  Map  Maker1  is  another  example  of  an 
application  that  allows  volunteers  to  map  features  such  as  roads  and  points  of  inter¬ 
est  (POI).  These  are  then  displayed  on  Google  Maps  in  certain  countries  where 
the  review  process  is  well  developed  enough  to  ensure  a  minimum  level  of  quality. 

A  second  example  of  active  framework  data  contributed  by  citizens  is  the 
mapping  of  cadastral  boundaries  and  properties  (Kalantari  and  La,  2015).  This 
is  particularly  relevant  for  developing  countries  where  land  rights  are  not  well 
documented.  This  is  also  relevant  in  places  where  surveying  is  very  expensive 
and  time-consuming  and  so  has  not  been  carried  out  in  all  areas,  which  leads 
to  a  stagnation  in  the  property  market.  An  example  from  Greece  is  outlined  by 
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Basiouka  and  Potsiou  (2012),  who  conducted  an  experiment  in  the  rural  part  of 
the  village  of  Tsoukalades,  on  the  island  of  Lefkada,  where  fifteen  volunteer  land 
owners  used  a  handheld  GPS  to  delineate  their  land  parcel  boundaries.  When 
the  results  were  compared  with  an  official  survey,  the  locations  and  shapes  of 
all  parcels  were  found  to  be  correct  and  the  majority  of  the  parcels  had  area 
calculations  that  were  within  the  tolerance  limits  of  the  specifications  set  by 
the  Hellenic  Cadastre.  Moreover,  the  land  owners  wanted  to  be  involved  in  the 
collection  of  these  data  and  hence  motivation  was  high.  Thus,  citizen  involve¬ 
ment  holds  great  potential  for  helping  to  gather  this  type  of  framework  infor¬ 
mation.  In  a  more  recent  study  by  Basiouka  et  al.  (2015),  surveying  students 
were  tasked  with  assessing  the  feasibility  of  using  OSM  for  cadastral  mapping 
in  Athens,  Greece.  The  results  showed  good  accuracy,  low  costs,  and  ease-of- 
use  for  non-experts,  indicating  that  OSM  is  one  possible  solution  for  crowd¬ 
sourcing  land  parcels  and  features,  particularly  if  adopting  a  hybrid  solution 
in  which  surveying  experts  are  used  in  training  and  quality  assurance.  Mobile 
phones  can  also  be  used  for  securing  land  rights;  GeoODK  (Geographic  Open 
Data  Kit)  is  an  Android-based  mobile  phone  app  for  spatial  and  attribute  data 
collection  that  is  being  used  by  the  Cadasta  Foundation2  to  help  people  map 
their  lands  and  resources  and  assert  their  rights. 

In  the  area  of  gazetteers,  Wikimapia3  is  a  very  well  known  initiative  that  aims 
to  describe  places  in  the  world  (Goodchild,  2007).  It  is  freely  available  and  all 
the  content  is  provided  by  volunteers.  Users  can  mark  places,  add  descriptions 
with  links  and  upload  and  categorise  photos.  Entries  are  then  voted  on  by  a 
group  of  peers.  To  access  the  raw  data,  the  Wikimapia  API  and  Motomapia4 
are  available.  GeoNames5  is  another  gazetteer,  containing  over  10  million  geo¬ 
graphical  names  and  available  to  download  free  of  charge:  volunteers  can  con¬ 
tribute  by  editing  existing  names  or  adding  new  names  through  the  GeoNames 
website. 

Mapping  of  land  cover  and  land  use  is  another  area  of  framework  data.  Some 
of  the  current  authoritative  products  have  been  created  globally,  e.g.  Globe- 
Land30  (Chen  et  al.,  2015);  regionally,  such  as  CORINE  land  cover6  for  EU 
countries  or  AFRICOVER  for  some  African  countries  (FAO,  1998);  and  nation¬ 
ally  by  NMAs,  e.g.  the  land  cover  map  of  Great  Britain  produced  by  the  Centre 
for  Ecology  and  Hydrology  (Fuller  et  al.,  2002).  These  authoritative  products 
use  satellite  and  aerial  imagery  in  combination  with  different  types  of  classifica¬ 
tion  algorithms,  and  there  is  often  a  long  period  of  time  between  updates  due  to 
the  difficulty  of  the  task.  One  problem  that  has  been  highlighted  by  researchers 
is  that  when  these  maps  are  compared  spatially,  there  are  often  areas  where  they 
disagree  (Fritz  et  al.,  2011).  Several  efforts  have  been  undertaken  to  tackle  this 
problem,  with  a  promising  contribution  from  VGI.  For  example,  the  Geo-Wiki 
tool7  for  crowdsourcing  land  cover  data  asks  volunteers  to  interpret  very-high- 
resolution  satellite  imagery  from  Google  Earth  and  Bing  to  increase  the  amount 
of  in-situ  data  for  producing  and  validating  land  cover  products  (Fritz  et  al., 
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2012;  See  et  al.,  2015).  One  of  the  latest  Geo-Wiki  applications  is  called  Foto- 
Quest  Austria8,  and,  in  contrast  to  the  online  Geo-Wiki  applications,  encour¬ 
ages  volunteers  to  go  out  into  the  field  and  collect  land  cover  and  land  use 
information  using  a  mobile  app.  The  idea  behind  the  project  is  to  see  whether 
volunteers  can  collect  in-situ  data  based  on  the  Land  Use  and  Coverage  Area 
frame  Survey  (LUCAS)  protocol  (Eurostat,  2015)  and  complement  this  author¬ 
itative  data  source.  LUCAS  is  currently  the  only  official  validation  dataset  for 
products  such  as  CORINE  land  cover  and  the  very-high-resolution  (VHR)  lay¬ 
ers  produced  as  part  of  the  Copernicus  land  monitoring  service  (Biittner  and 
Eiselt,  2013;  Gallego,  2011).  Thus,  any  additional  in-situ  data  have  great  value 
for  calibration  and  validation  of  products  from  Earth  Observation,  especially 
in  terms  of  density  and  frequency  of  updating  (See  et  al,  2016a).  Initial  results 
from  a  comparison  of  land  cover  and  land  use  data  collected  from  the  app  with 
the  authoritative  LUCAS  data  indicate  that  volunteers  are  able  to  identify  basic 
land  cover  and  land  use  types  on  the  ground  but  that  more  detailed  land  cover 
types  will  require  some  training  (Laso  Bayas  et  al.,  2016).  The  app  is  currently 
being  rolled  out  to  other  EU  countries.  Similar  tools  to  Geo-Wiki  have  been 
developed  by  other  research  teams.  For  example,  the  VIEW-IT  application 
(Clark  and  Aide,  2011)  is  a  collaborative  effort  to  record  reference  information 
on  land  use  and  land  cover,  while  Google  Earth  Grids  (Jacobson  et  al,  2015) 
allows  users  to  create  an  interactive  and  user-specified  grid  over  Google  Earth 
imagery  and  identify  the  land  cover  in  each  square  of  the  grid. 

As  shown  in  Figure  1,  a  final  area  where  VGI  has  been  used  to  actively  map 
framework  data  is  that  of  biking  and  hiking  trails  (which  may  or  may  not 
appear  in  the  topographic  databases  of  NMAs;  thus  this  category  could  also 
be  included  in  active  non-framework  data).  An  example  of  such  an  initiative 
is  MapMyFitness9,  which  is  a  suite  of  mobile  apps  and  websites  that  provide 
interactive  tools  to  map  and  share  fitness  activities  including  running,  walk¬ 
ing,  cycling  and  hiking10.  Each  of  these  provide  paths  and  trails  that  could  be 
incorporated  into  the  topographic  database  of  an  NMA.  Bikemap11  and  Bikely12 
are  other  examples  of  initiatives  to  map  bike  routes,  with  many  more  examples 
to  be  found  online.  Bikemap  has  more  than  2.8  million  cycling  routes  available, 
where  the  routes  are  accessible  via  the  web  interface  and  also  through  the  API, 
while  routes  in  Bikely  can  be  accessed  via  the  web  interface  or  downloaded 
in  GPX  and  KML  formats.  Finally,  there  are  many  hiking  sites  available.  An 
example  is  AllTrails13,  which  is  a  platform  for  sharing  geotagged  user-generated 
travel  content.  Travel  experiences  are  shared  through  an  interactive  map  and 
can  include  photographs  plotted  along  the  trip  route;  mobile  apps  and  a  devel¬ 
oper  API  are  available  to  access  the  platform  and  manage  the  data.  Wikiloc14, 
with  more  than  2  million  users,  around  5  million  outdoor  trails  and  8  million 
photographs,  is  very  popular  for  discovering  and  sharing  the  best  trails  for  out¬ 
door  activities,  and  offers  routes  and  waypoints  (POIs)  along  with  elevation 
profiles,  distances  and  images  taken. 
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3.2  Active  Non-framework  Data 

In  contrast  to  active  framework  data,  there  are  many  diverse  examples  of  ini¬ 
tiatives  for  active  non-framework  data.  It  is  not  possible  to  comprehensively 
list  all  of  them  or  even  touch  upon  every  domain  in  which  these  initiatives  are 
emerging,  as  this  is  a  very  dynamic  area:  the  reader  is  referred  to  sites  such  as 
those  of  SciStarter15  and  the  Citizen  Science  Alliance16,  which  are  portals  to 
many  other  citizen  science  projects.  Not  all  are  spatially-oriented  but  location 
is  usually  a  key  attribute  collected  by  citizens.  Here  we  have  chosen  to  focus  on 
live  main  areas  shown  in  Figure  1:  weather,  biodiversity,  environment,  disasters 
and  crime. 

Amateur  weather  stations  are  a  prime  example  of  active  data  contribu¬ 
tions  and  have  become  important  sources  of  information  for  applications  in 
hydrology,  drought,  agriculture,  engineering  and  architecture,  among  others 
(Doesken  and  Reges,  2010).  The  US  National  Weather  Service  Cooperative 
Observer  Program  is  a  weather  and  observing  network  of  more  than  8,700 
volunteers  who  provide  observations  from  farms,  urban  areas,  national  parks, 
coastlines  and  mountaintops  within  the  US  (Leeper  et  al.,  2015).  There  are  other 
similar  initiatives,  such  as  the  Citizen  Weather  Observer  Program17,  which  col¬ 
lects  data  from  more  than  7,000  stations  in  North  America  and  sends  around 
50,000  to  75,000  observations  every  hour,  and  Weather  Underground18,  which 
is  a  weather  service  that  provides  real-time  weather  information  for  free  over 
the  Internet  and  incorporates  data  from  more  than  200,000  personal  weather 
stations  around  the  world.  Other  notable  initiatives  include  CoCoRaHS,  which 
is  a  community-based  network  of  volunteers  who  measure  and  map  precipi¬ 
tation  in  the  form  of  rain,  hail  and  snow,  and  a  mobile  app  called  mPING19, 
which  allows  users  to  contribute  weather  reports.  As  of  mid-2015,  CoCoRaHS 
volunteers  have  submitted  over  3 1  million  daily  precipitation  reports  and  tens 
of  thousands  of  reports  of  hail,  heavy  rain  and  snow  (Reges  et  al.,  2016),  while 
the  data  collected  through  mPING  are  used  to  fine-tune  weather  forecasts. 

Biodiversity  monitoring  is  the  second  area  where  volunteers  have  been 
actively  contributing  non-framework  data.  There  are  hundreds  of  different  citi¬ 
zen  science  projects  in  this  area,  mainly  because  there  is  a  long  history  of  citi¬ 
zen  involvement  in  conservation,  as  mentioned  previously.  Some  of  these  are 
local  projects,  collecting  data  on  a  small  scale,  while  others  have  more  global 
reach.  An  example  of  a  more  local  project  is  the  Invaders  of  Texas  Program, 
where  citizen  scientists  are  trained  to  detect  the  arrival  and  dispersal  of  invasive 
species  and  report  them  using  the  online  mapping  database  (Gallo  and  Waitt, 
2011).  iSpot20  and  iNaturalist21  are  initiatives  with  global  reach  and  both  have 
mobile  apps  for  data  collection,  where  the  data  collected  by  citizens  have  been 
used  in  scientific  research  (e.g.  Silvertown  et  al.,  2015). 

Citizens  are  also  active  in  monitoring  the  environment.  Global  Water 
Watch22,  which  is  a  voluntary  network  that  monitors  surface  waters  for  the 


Sources  of  VGI  for  Mapping  2 1 


improvement  of  both  water  quality  and  public  health,  is  a  prime  example  of 
such  monitoring.  Another  example  is  the  Global  Learning  and  Observations 
to  Benefit  the  Environment  (GLOBE)  Program,  which  aims  to  increase  envi¬ 
ronmental  awareness  and  to  actively  involve  schools  in  science;  there,  students 
perform  measurements  that  are  of  research  quality  and  report  their  observa¬ 
tions  to  archives  designed  for  the  study  of  the  Earth.  Since  1995,  the  GLOBE 
network  has  grown  to  include  representatives  from  112  countries.  One  of  the 
environmental  parameters  measured  in  the  framework  of  the  GLOBE  Program 
is  air  pollution  in  terms  of  aerosols.  In  addition  to  creating  awareness  about 
aerosols  and  their  role  in  climate  and  air  quality,  the  measurements  can  be  of 
significant  value  for  validation  of  satellite  products  (Brooks  and  Mims,  2001; 
Boersma  and  de  Vroom,  2006).  More  recently,  the  EU  has  funded  four  citi¬ 
zen  observatories23  covering  different  aspects  of  citizen-based  environmental 
monitoring:  Citi-Sense  (air  pollution);  Omniscentis  (odours);  CobWeb  (land 
cover  and  land  use);  and  WeSenselt  (flooding). 

Another  environmental  issue  in  cities,  especially  in  dense  urban  areas,  is 
noise,  which  can  become  a  public  health  issue  in  extreme  cases.  NoiseWatch24 
is  a  citizen  science  project  supported  by  the  European  Environment  Agency 
that  integrates  noise  data  from  official  scientific  sources  with  noise  data  col¬ 
lected  from  crowdsourced  observations.  A  mobile  application  can  be  used  by 
citizens  to  measure  the  level  of  noise  in  their  location,  which  is  automatically 
uploaded  to  a  central  database.  These  data  can  then  be  used  to  develop  noise 
maps  for  decision-making.  Finally,  in  the  area  of  light  pollution,  the  Cities  at 
Night25  initiative  is  a  citizen  science  project  to  help  georeference  photographs 
of  cities  taken  by  astronauts  on  the  International  Space  Station  at  night.  Using 
these  images,  it  is  possible  to  compare  the  efficiency  of  lighting  across  different 
cities  on  the  planet  as  well  as  study  their  light  pollution,  which  can  have  a  nega¬ 
tive  effect  on  ecosystems  and  health  (Falchi  et  al.,  2011). 

The  fourth  area  of  active  non-framework  data  collection  is  in  disaster  map¬ 
ping.  The  Humanitarian  OpenStreetMap  Team  (HOT)26  is  an  initiative  that 
rallies  a  huge  network  of  volunteers  when  disaster  strikes  to  create  maps  that 
enable  responders  to  reach  those  in  need.  HOT  was  launched  after  the  January 
12,  2010  Haiti  earthquake,  when  600  remotely  located  volunteer  mappers  built 
a  base  layer  map  to  support  the  aid  effort  (Soden  and  Palen,  2014).  HOT  vol¬ 
unteers  were  also  effectively  mobilised  during  the  November  8,  2013  Typhoon 
Yolanda  in  the  Philippines  (Palen  et  al.,  2015).  Going  back  to  earthquakes,  Did 
You  Feel  It?27  is  an  initiative  from  the  United  States  Geological  Survey  (USGS) 
that  maps  where  earthquakes  were  experienced  by  individuals  and  the  sever¬ 
ity  of  the  damage.  Any  citizen  who  feels  an  earthquake  can  report  it  online  by 
selecting  the  earthquake  from  a  real-time  map  of  earthquakes  and  filling  in  a 
survey  with  detailed  questions  on  their  experiences  as  well  as  their  location. 

The  final  area  being  considered  here  is  crime  and  public  safety.  Citizens  are 
willing  to  contribute  especially  when  they  feel  threatened.  Alertos28  is  a  citizen 
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observation  platform  to  report  crime  and  similar  events  to  the  legal  authorities 
in  Guatemala,  Latin  America.  An  interactive  map  showing  reported  events  by 
category  and  time  is  also  available  on  the  website.  WikiCrimes29  is  a  collabora¬ 
tive  wiki-type  initiative  to  report  crime  events  of  different  categories  through 
the  website.  Such  events  can  then  be  visualised  and  filtered  using  an  interactive 
map.  Mobile  apps  are  also  available  to  provide  users  with  information  on  the 
safety  of  a  place  based  on  the  analysis  of  the  reported  events.  CrimeReports30 
and  SpotCrime31  are  examples  of  similar  initiatives  for  reporting  data  on  differ¬ 
ent  types  of  crimes  in  the  US,  Canada  and  the  UK.  Emotional  and  perception 
mapping  is  another  area  where  initiatives  have  emerged  to  understand  the  level 
of  security  perceived  by  citizens  and  their  spatial  distribution.  Measuring  the 
fear  of  crime  has  been  undertaken  as  part  of  a  research  project  developed  at 
Obudai  University  Alba  Regia  Technical  Faculty  Institute  of  Geoinformatics: 
contributors  are  asked  to  fill  an  online  survey32  and  draw  a  red  or  grey  polygon 
to  report  that  they  are  feeling  respectively  unsafe  or  safe.  Finally,  the  Ushahidi 
platform33  has  been  used  to  map  reports  of  violence  in  Kenya  after  the  post¬ 
election  violence  in  2008.  Since  then  several  initiatives  have  used  this  platform 
to  empower  citizens  to  report  different  events,  e.g.  the  Map  it.  End  it34  initiative 
to  map  technology- related  violence  against  women  and  the  Egyptian  Zabatak35 
initiative. 


3.3  Passive  Framework  Data 

There  are  not  many  examples  of  passive  framework  data  collection  but  such 
collection  does  exist,  e.g.  through  the  Google  Traffic  application:  through  a 
smartphone  with  the  Google  Maps  app  installed  and  the  location  functionality 
activated,  users  continuously  send  Google  anonymous  data  on  how  fast  they 
are  moving.  Google  then  analyses  the  data  coming  in  from  the  same  location 
and  sends  back  accurate  information  on  traffic  conditions.  Such  information 
on  traffic  volumes  and  hotspots  can  be  used  to  improve  road  planning  (see  e.g. 
Barth,  2009)  as  well  as  road  mapping  (Ekpenyong  et  al.,  2009).  Satellite  naviga¬ 
tion  companies  also  gather  traffic  and  travel  data  from  their  customers’  devices 
in  a  passive  mode.  In  addition,  the  TomTom  satellite  navigation  company  has 
developed  the  Map  Share  Reporter36  as  a  way  of  allowing  customers  to  make 
active  changes  to  the  map  and  share  these  with  other  TomTom  users.  Thus,  they 
are  crowdsourcing  improvements  to  their  product. 

Another  example  is  the  crowdsourcing  of  features  using  gamification  via 
the  Google  Ingress  game37  to  improve  Google  Maps.  The  idea  behind  the 
game  is  to  find  a  portal  and  capture  it.  In  the  process  of  doing  this,  players 
are  asked  to  travel  on  specific  routes  and  photograph  locations  or  features 
along  their  way  to  the  portal.  In  this  way  Google  gathers  information  from 
the  players.  The  main  goal  of  the  players  is  to  gain  control  over  the  portals  and 
have  fun,  so  the  data  collection  has  been  seamlessly  integrated  into  the  game. 
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This  is  an  example  of  a  very  cleverly  disguised  way  of  updating  map  features 
through  crowdsourcing. 


3.4  Passive  Non-framework  Data 

Several  examples  can  be  found  in  the  category  of  non-framework  data  con¬ 
tributed  passively  by  citizens,  and  can  be  mapped  and  analysed  for  different 
applications.  The  Google  search  engine  is  used  approximately  3.5  billion  times 
per  day38,  where  Google  collects  the  search  terms  along  with  other  data  such 
as  the  location  where  the  search  has  been  made.  This  allows  Google  to  analyse 
a  vast  amount  of  data,  e.g.  trends  in  influenza  based  on  frequency  of  searching 
(Ginsberg  et  al.,  2009).  To  allow  researchers  to  analyse  the  data  using  their 
own  queries,  Google  has  developed  some  online  tools.  For  example,  Google 
Trends39  is  a  tool  that  shows  the  frequency  of  a  particular  search  term  relative 
to  the  total  search  volume  across  various  regions  of  the  world,  and  in  vari¬ 
ous  languages.  Choi  and  Varian  (2012)  demonstrated  how  Google  trends  can 
help  to  predict  current  phenomena  much  quicker  than  the  usual  reporting 
process  in  diverse  areas  such  as  motor  vehicles  and  parts,  initial  claims  for 
unemployment  benefits  or  travel  planning.  Another  tool  called  Google  Cor¬ 
relate40  works  in  the  reverse  way.  Users  upload  a  time  series  or  spatial  pat¬ 
tern  of  interest  and  the  software  returns  the  queries  that  best  mimic  the  data 
(Mohebbi  et  al.,  2011):  Google  calculates  a  correlation  coefficient  between  the 
uploaded  time  series  and  the  time  series  of  every  query  in  their  database,  and 
the  results  displayed  are  those  queries  that  generate  the  highest  correlation 
with  the  uploaded  data. 

Another  big-data  source  of  passively  collected  non-framework  data  is  real¬ 
time  transport  information  such  as  live  feeds  from  buses,  metro  stations, 
bike  scheme  data,  trains,  etc.  APIs  are  available  to  retrieve  the  data  and  can 
be  brought  together  in  dashboard  type  applications  that  provide  information 
on  the  status  of  different  transportation  systems  in  real-time,  the  weather,  air 
pollution,  electricity  demand,  etc.  For  example,  the  CityDashboard  project41 
was  developed  by  the  Centre  for  Advanced  Spatial  Analysis  at  UCL,  London, 
and  is  available  for  a  number  of  UK  cities.  The  CityDashboard  data  have  also 
been  used  to  extract  useful  information  for  other  purposes  such  as  generating 
insights  into  sustainable  transport  systems  (O’Brien  et  al.,  2014)  or  the  health 
impact  of  bicycle  sharing  systems  (Woodcock  et  al.,  2014);  for  example,  the 
Bike  Share  Map42  shows  the  status  of  biking  system  docks  in  real-time  for  sev¬ 
eral  cities  around  the  world.  Uniman  et  al.  (2010)  used  data  from  the  Oyster 
Smart  Card  (public  transport  card  for  the  London  Underground)  to  determine 
the  reliability  of  the  Underground  system.  Using  data  on  the  entries  and  exits 
to/from  London  Underground  stations,  they  developed  metrics  based  on  the 
travel  time  of  passengers.  This  type  of  big  data  (where  there  are  more  than  1.3 
billion  metro  and  2.4  billion  bus  journeys  annually  in  London;  Transport  for 
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London,  2015),  has  great  potential  for  improving  passenger  experiences  and  for 
planning  future  transport  projects. 

Mobile  phone  data  from  communication  network  operators  represent 
another  big-data  source  of  passively  collected  non-framework  data.  These 
data  have  been  analysed  to  investigate  applications  in  areas  such  as  transpor¬ 
tation  planning  (Di  Lorenzo  et  al.,  2016),  user  behaviour  (Bianchi  et  al.,  2016), 
public  health  (Oliver  et  al.,  2015),  the  spatial  spread  of  diseases  such  as  chol¬ 
era  (Bengtsson  et  al.,  2015)  or  population  displacement  after  a  major  disaster 
(Wilson  et  al.,  2016). 

A  fourth  area  of  passively  collected  non-framework  data  is  travel  websites 
and  travel  blogs,  where  all  of  the  information  provided  is  attached  to  a  loca¬ 
tion  and  can  therefore  be  mapped.  TripAdvisor  is  the  worlds  largest  travel  site, 
where  users  rate  their  accommodation,  restaurants  and  attractions,  providing 
their  collective  intelligence  to  the  system.  Any  users  can  then  access  this  infor¬ 
mation  for  free  to  make  informed  decisions.  There  are  many  examples  of  book¬ 
ing  sites  that  draw  upon  TripAdvisor  or  have  their  own  rating  system  based 
upon  user  feedback,  e.g.  Booking.com  and  Trivago,  among  many  others. 

Social  media  websites  such  as  Facebook  and  Twitter  are  also  prime  examples 
that  fall  within  this  category  of  passive  non-framework  data  collection;  infor¬ 
mation  can  be  shared  with  location  data,  depending  on  whether  users  enable 
this  option  in  the  application.  Geotagged  tweets  are  now  being  used  in  a  num¬ 
ber  of  applications,  mostly  related  to  crisis  events  and  disaster  management. 
For  example,  Twitter  was  used  during  the  2010  Pakistan  floods  (Murthy  and 
Longwell,  2013)  and  tweets  were  an  active  source  of  information  during  flood¬ 
ing  in  Jakarta,  allowing  for  the  creation  of  open  source  flood  maps  through  the 
Peta  Jakarta  initiative43. 

Finally,  websites  that  allow  users  to  share  geotagged  photographs  are  included 
in  this  category.  Panoramio,  Flickr  and  Instagram  are  a  few  examples  of  such 
initiatives.  Users  upload  their  photographs  along  with  additional  information 
such  as  date  and  time,  textual  tags  and  geotags,  among  others,  making  it  pos¬ 
sible  to  map  the  photographs.  Research  has  been  conducted  to  explore  ways  to 
use  such  data  for  different  applications  including  land  cover  and  land  use  map¬ 
ping  (Estima  and  Painho,  2014;  Antoniou  et  al.,  2016). 


3.5  3DVGI 

The  third  dimension  in  geospatial  data  is  height  or  elevation.  Height  is  now 
being  added  by  volunteers  to  mapping  initiatives  such  as  OSM,  e.g.  the  heights 
of  buildings  and  roof  geometry,  which  means  that  3D  models  of  cities  can  be 
created  from  VGI  (Goetz  and  Zipf,  2013).  Height  values  of  GPS  traces  in  OSM 
also  show  a  promising  way  of  retrieving  3D  information  for  elaborating  height 
information  from  SRTM  and  ASTER  DEM  models  (John  et  al.,  2016).  A  3D 
model  of  a  city  can  be  generated  using  a  GIS  package  or  via  OSM-3D,  which 
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allows  OSM  to  be  visualised  as  a  3D  model  on  a  virtual  globe  (Over  et  al, 
2010).  However,  height  information  is  still  not  commonly  added  to  buildings 
on  OSM,  with  less  than  1.5%  of  buildings  having  height  information  available 
in  November  2011  (Goetz  and  Zipf,  2013).  If  more  height  data  were  added  to 
OSM,  it  would  open  up  many  possibilities  for  urban  planning,  transportation 
planning,  navigation  and  disaster  management,  among  others,  particularly  in 
locations  where  an  SDI  is  currently  lacking. 

Elevation  data  are  publicly  available  through  the  NASA’s  Shuttle  Radar 
Topography  Mission  (SRTM)  at  a  resolution  of  30m.  A  new  source  of  higher 
resolution  elevation  data,  which  are  being  collected  by  volunteers,  is  Unmanned 
Aerial  Vehicles  (UAVs).  When  DEMs  generated  using  UAVs  were  compared 
with  DEMs  from  LIDAR  in  the  context  of  hydrological  modelling  (Leitao  et  al., 
2016),  the  results  were  promising  and  UAVs  represented  an  affordable  option 
for  3D  mapping.  UAVs  are  also  used  in  mapping  damages  after  a  disaster  event 
(Adams  and  Friedland,  2011).  To  accommodate  the  growing  source  of  aerial 
imagery  from  UAVs  and  other  freely  available  satellite  imagery,  Development 
Seed  and  HOT  have  developed  OpenAerialMap44,  which  is  a  new  service  for 
contributing  to  and  accessing  this  new  source  of  data  from  volunteers. 


4  Issues  Related  to  VGI  for  Mapping 

One  of  the  main  issues  that  is  always  raised  with  VGI,  and  is  often  perceived 
as  a  barrier  to  its  further  use,  is  the  quality  of  the  data.  For  this  reason  a  con¬ 
siderable  quantity  of  literature  has  appeared  on  this  topic  (see  e.g.  Antoniou 
and  Skopeliti,  2015;  Bordogna  et  al.,  2015;  Flanagin  and  Metzger,  2008;  Jokar 
Arsanjani  et  al.,  2015a).  There  is  an  ISO  standard  for  spatial  quality  that  can  be 
applied  to  VGI,  but  additional  quality  indicators  are  required  due  to  the  char¬ 
acteristics  that  are  specific  to  VGI.  This  ISO  framework,  along  with  additional 
quality  indicators,  is  discussed  in  more  detail  in  Chapter  7  by  Fonte  et  al.  (2017). 
Quality  is  of  particular  interest  to  NMAs,  some  of  which  see  the  possibility  of 
using  VGI  as  a  way  to  potentially  update  maps  that  would  otherwise  only  be 
re- surveyed  professionally  every  few  years,  or  view  VGI  as  a  complementary 
source  of  information  of  a  richer  nature,  e.g.  footpaths  and  cycle  paths  that  may 
not  be  mapped.  NMA  experiences  of  VGI  for  these  purposes  is  documented 
in  Chapter  13  by  Olteanu-Raimond  et  al.  (2017),  including  the  barriers  to  the 
adoption  of  this  source  of  information.  Demetriou  et  al.  (2017)  in  Chapter  12 
consider  the  broader  question  of  integrating  VGI  with  SDIs  and  how  this  might 
be  achieved  in  the  future. 

Another  key  issue  that  is  commonly  discussed  in  relation  to  VGI,  in  particular 
active  VGI  projects,  is  how  to  recruit  participants,  keep  them  motivated  and  sus¬ 
tain  the  project  in  the  future  (see  e.g.  Coleman  et  al,  2009;  Nov  et  al.,  2010;  Reed 
et  al.,  2013).  However,  more  research  is  still  needed  that  looks  into  what  consti¬ 
tutes  effective  incentives  for  participation  and  how  citizens  can  be  mobilised  to 
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participate  in  ways  that  are  mutually  beneficial  to  them  while  contributing  VGI. 
These  aspects  of  recruitment,  motivation  and  sustainability  are  covered  in  detail  in 
Chapter  5  by  Fritz  et  al.  (2017),  where  the  authors  review  a  series  of  crowdsourc¬ 
ing  initiatives  in  a  comparative  analysis  on  recruitment  strategies,  techniques  for 
motivation  and,  more  generally,  issues  of  sustainability. 

The  involvement  of  citizens  in  VGI  immediately  raises  critical  questions 
regarding  copyright,  ownership,  data  privacy  and  licensing  of  the  data,  par¬ 
ticularly  when  the  data  contributed  by  citizens  are  then  integrated  with  third 
party  base  layers  (see  e.g.  the  work  by  Saunders  et  al.  (2012)  within  a  Canadian 
context).  There  are  also  ethical  issues  with  VGI  data  use  with  respect  to  health 
and  disease  surveillance  (Blatt,  2015).  The  chapter  by  Mooney  et  al.  (2017)  on 
privacy,  ethics  and  legal  issues  tackles  these  concerns  in  more  detail. 

Finally  there  is  a  new  trend  in  the  development  of  citizen  observatories, 
which  are  defined  as  a  framework  that  combines  participatory  community 
monitoring  (including  policy-makers,  scientists  and  other  stakeholders)  with 
technology  such  as  web  portals,  mobile  devices  and  low-cost  sensors  (Liu  et  al., 
2014).  This  new  trend  is  the  subject  of  Chapter  15  by  Liu  et  al.  (2017). 


5  Conclusions 

This  chapter  provided  an  overview  of  sources  of  VGI  for  mapping,  categorised 
according  to  whether  the  data  are  collected  by  government  agencies  as  part 
of  an  SDI  (i.e.  framework  data)  or  in  other  domains  (e.g.  weather  or  ecology, 
among  others),  as  well  as  according  to  the  mode  of  data  collection,  i.e.  active  or 
passive.  A  range  of  examples  were  then  provided  to  illustrate  the  different  types 
of  VGI  that  fall  into  these  categories.  3D  VGI  was  discussed  as  a  special  case. 
With  advances  in  technology,  e.g.  3D  mobile  phones,  and  the  increasing  interest 
in  UAVs,  many  new,  low-cost  solutions  will  emerge,  from  biomass  mapping  to 
hydrological  modelling  to  smart  cities  applications.  Finally,  the  chapter  intro¬ 
duced  some  of  the  main  issues  surrounding  the  use  of  VGI,  including,  among 
others,  quality,  participant  recruitment  and  motivation  and  the  trend  toward 
citizen  observatories,  which  are  the  subjects  of  different  chapters  throughout 
the  book.  New  advances  in  data  mining  and  knowledge  discovery  techniques 
may  also  help  to  improve  the  quality  of  VGI  in  the  future. 

The  wide  range  of  VGI  as  a  data  source  for  mapping  illustrates  the  growing 
interest  in  collecting  and  using  these  data  for  many  different  purposes.  VGI  has 
the  potential  to  complement  but  also  rival  more  traditional  mapping  sources 
in  both  quality  and  richness.  What  has  been  presented  here  is  only  the  start  of 
a  growing  citizen-based  contribution  to  many  different  domains.  Many  of  the 
sources  listed  in  this  chapter  will  disappear,  only  to  be  replaced  by  many  other 
projects  and  initiatives  in  the  future.  For  NMAs,  the  key  will  be  the  successful 
engagement  of  citizens  in  helping  to  update  and  correct  the  more  authoritative 
sources  in  such  a  way  that  both  entities  benefit  in  the  long  run. 
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Notes 

1  https://www.google.com/mapmaker 

2  http://cadasta.org/ 

3  http://wikimapia.org/ 

4  http://www.motomapia.com/ 

5  http://www.geonames.org/ 

6  http://land.copernicus.eu/pan-european/corine-land-cover 

7  http://www.geo-wiki.org 

8  http://fotoquest.at 

9  http://www.mapmyfitness.com/ 

10  Respectively  through  MapMyRun  (http://www.mapmyrun.com/), 
MapMyWalk  (http://www.mapmywalk.com/),  MapMyRide  (http://www. 
mapmyride.com/)  and  MapMyHike  (http://www.mapmyhike.com/). 

11  https://www.bikemap.net/ 

12  http://www.bikely.com/ 

13  https://www.alltrails.com/ 

14  http://www.wikiloc.com/ 

15  https://scistarter.com/ 

16  http://www.citizensciencealliance.org/ 

17  http://wxqa.com/ 

18  https://www.wunderground.com 

19  http://www.nssl.noaa.gov/projects/ping/ 

20  http://www.ispotnature.org/communities/global 

21  http://www.inaturalist.org/ 

22  http://www.globalwaterwatch.org/ 

23  http://www.citizen-obs.eu/ 

24  http://discomap.eea.europa.eu/map/NoiseWatch/ 

25  http://www.citiesatnight.org/ 

26  https://hotosm.org/ 

27  http://earthquake.usgs.gov/data/dyfi/ 

28  http://alertos.org/ 

29  http://wikicrimes.org 

30  https://www.crimereports.com 

31  http://spotcrime.com 

32  http://bunmegelozes.amk. uni-obuda.hu/MainPageEng.php?ln=l 

33  https://www.ushahidi.com/ 

34  https://www.takebackthetech.net/mapit/ 


28  Mapping  and  the  Citizen  Sensor 


35  http://zabatak.com/ 

36  http://www.tomtom.com/mapshare/tools 

37  https://www.ingress.com/ 

38  http://www.internetlivestats.com/google-search-statistics/ 

39  https://www.google.com/trends/ 

40  https://www.google.com/trends/correlate/ 

41  http://citydashboard.org/ 

42  http://bikes.oobrien.com/ 

43  https://petajakarta.org/banjir/en/ 

44  https://openaerialmap.org/ 
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Abstract 

While  there  is  now  a  considerable  variety  of  sources  of  Volunteered  Geo¬ 
graphic  Information  (VGI)  available,  discussion  of  this  domain  is  often  exem¬ 
plified  by  and  focused  around  OpenStreetMap  (OSM).  In  a  little  over  a  decade 
OSM  has  become  the  leading  example  of  VGI  on  the  Internet.  OSM  is  not  just 
a  crowdsourced  spatial  database  of  VGI;  rather,  it  has  grown  to  become  a  vast 
ecosystem  of  data,  software  systems  and  applications,  tools,  and  Web-based 
information  stores  such  as  wikis.  An  increasing  number  of  developers,  indus¬ 
try  actors,  researchers  and  other  end  users  are  making  use  of  OSM  in  their 
applications.  OSM  has  been  shown  to  compare  favourably  with  other  sources 
of  spatial  data  in  terms  of  data  quality.  In  addition  to  this,  a  very  large  OSM 
community  updates  data  within  OSM  on  a  regular  basis.  This  chapter  provides 
an  introduction  to  and  review  of  OSM  and  the  ecosystem  which  has  grown 
to  support  the  mission  of  creating  a  free,  editable  map  of  the  whole  world. 
The  chapter  is  especially  meant  for  readers  who  have  no  or  little  knowledge 
about  the  range,  maturity  and  complexity  of  the  tools,  services,  applications 
and  organisations  working  with  OSM  data.  We  provide  examples  of  tools  and 
services  to  access,  edit,  visualise  and  make  quality  assessments  of  OSM  data. 
We  also  provide  a  number  of  examples  of  applications,  such  as  some  of  those 
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used  in  navigation  and  routing,  that  use  OSM  data  directly.  The  chapter  fin¬ 
ishes  with  an  indication  of  where  OSM  will  be  discussed  in  the  other  chapters 
in  this  book,  and  we  provide  a  brief  speculative  outlook  on  what  the  future 
holds  for  the  OSM  project. 


Keywords 

OpenStreetMap,  geodata,  open  data,  Volunteered  Geographic  Information 
(VGI) 


1  Introduction 

The  OpenStreetMap  (OSM)  project  was  founded  in  2004  and  has  now  posi¬ 
tioned  itself  as  the  most  famous  example  of  Volunteered  Geographic  Informa¬ 
tion  (VGI)  on  the  Internet  (Jokar  Arsanjani  et  al.,  2015).  While  OSM  is  only 
one  of  many  well  established  and  well  known  VGI  projects  (See  et  al.,  2016), 
it  holds  a  dominant  position  in  the  VGI  landscape.  Chapter  2  of  this  book,  by 
See  et  al.  (2017),  gives  an  overview  of  different  sources  of  VGI  in  the  context  of 
its  usage  and  characteristics.  In  recent  years  OSM  has  attracted  very  significant 
research  attention  (Mooney,  2015)  and  could  almost  be  considered  a  field  of 
research  in  its  own  right  (Jokar  Arsanjani  et  al.,  2015);  given  the  influence  of 
OSM  on  the  VGI  and  citizen  sensor  research  landscape,  this  chapter  will  pro¬ 
vide  an  introduction  to  and  overview  of  the  OSM  project. 

OSM  was  founded  in  2004  by  then  MSc  student  Steve  Coast,  who  created  the 
idea  as  part  of  a  thesis  dissertation.  Around  that  time  the  concept  of  crowd¬ 
sourcing,  collaboration  and  Web-based  co-production  or  creation  of  knowl¬ 
edge  was  beginning  to  gain  momentum.  Coast’s  idea  was  simple:  if  I  collect 
geographic  data  about  my  area  -  where  I  have  local  knowledge  -  and  you 
collect  geographic  data  about  your  area  -  where  you  have  local  knowledge  - 
then  these  can  be  combined,  and  we  can  begin  to  build  a  spatial  database  of 
a  region.  If  this  scales  up  to  a  larger  crowd  of  people,  then  it  is  very  possible 
to  crowdsource  the  mapping  of  the  entire  world.  The  OSM  mission  statement 
grew  out  of  this  simple  idea,  which  was  to  be  a  collaborative  project  that  cre¬ 
ated  a  free  editable  map  of  the  world.  Rather  than  the  focus  being  on  outputs 
in  the  form  of  cartographic  products  and  maps,  the  core  of  OSM  is  a  spatial 
database,  which  contains  geographic  data  and  information  from  all  over  the 
world.  Many  authors  and  commentators  have  speculated  on  the  ingredients 
for  the  rapid  and  sustained  success  of  OSM  since  2004.  A  number  of  factors  are 
seen  as  having  been  influential  in  OSM’s  development.  In  the  first  instance  one 
of  these  factors  is  Web  2.0,  or  the  interactive  web  (O’Reilly,  2007),  which  facili¬ 
tates  the  development  of  large  scale  collaborative  projects  that  can  see  hun¬ 
dreds  or  thousands  of  people  contributing  simultaneously  -  the  most  famous 
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example  of  this  is  Wikipedia.  Secondly  the  availability  of  low-cost,  high-quality 
and  high-accuracy  Global  Positioning  System  (GPS)  means  that  consumers 
or  citizens  can  now  collect  geographic  information  using  smart  devices  such 
as  their  smartphones  or  dedicated  GPS  units;  these  geographic  data  can  then 
be  uploaded  and  contributed  to  OSM.  The  third  factor  is  related  to  the  citizen 
contributors:  the  OSM  project  welcomes  anyone  to  register  and  take  part  as 
a  contributor.  Contributors  can  span  the  entire  spectrum  of  geographic  and 
Information  Technology  expertise:  from  beginner  or  newcomer  to  expert  level 
geographer  or  software  developer. 


1.1  How  Does  One  Contribute  to  OSM? 

The  OSM  data  model  is  very  straightforward  to  understand.  There  are  three 
primitive  data  types  or  objects:  nodes,  ways  (polygons  and  polylines)  and  rela¬ 
tions  (logical  collections  of  ways  and  nodes).  A  way  is  made  up  of  at  least  two 
nodes  (for  polylines)  or  three  nodes  (for  closed  polygons).  A  node  represents 
a  geographic  point  feature  and  its  coordinate  is  usually  expressed  as  latitude 
and  longitude.  Within  OSM,  every  object  must  have  at  least  one  attribute  or  tag 
(a  key/value  pair)  assigned  to  it  to  describe  its  characteristics.  There  are  many 
guides  and  tutorial  documents  on  how  one  begins  to  map  with  OSM;  recently 
the  company  Mapbox  provided  an  updated  set  of  documentation  for  this1.  The 
OSM  Map  Features  pages  on  the  OSM  wiki  (OpenStreetMap,  2016)  represent 
the  reference  document  describing  the  officially  adopted  OSM  tags.  These 
tags  have  been  agreed  upon  over  the  years  and  there  are  wiki  pages  written  to 
describe  the  likely  usage  and  use  case  scenarios  of  each  tag.  OSM  follows  a  folk- 
sonomy  approach  to  tagging,  and,  in  theory,  any  tag  can  be  associated  with  any 
object  (Ballatore  and  Mooney,  2015).  Contributors  are  free  to  create  their  own 
tags.  As  several  authors  have  shown  (Ballatore  and  Mooney,  2015;  Ballatore 
and  Zipf,  2015),  this  can  lead  to  disagreements  amongst  contributors  or  confu¬ 
sion  on  how  to  use  specific  tags  in  certain  geographic  scenarios  (for  example 
tagging  an  object  representing  an  unpaved  pedestrian  footpath).  Services  such 
as  taginfo2  allow  exploration  and  visualisation  of  the  most  frequently  used  tags 
and  their  keys  for  the  entire  OSM  database.  The  taginfo  service  is  particularly 
useful  for  understanding  the  style  or  structure  of  tags  used  on  specific  object 
types,  conceptualising  the  very  wide  range  of  values  some  keys  are  assigned  in 
tags  and  the  spatial  distribution  of  tags.  Taginfo  is  constantly  updated  in  near 
real-time  and  stores  the  tags  from  every  object  in  the  global  OSM  database. 
There  is  no  theoretical  limit  on  the  number  of  tags  that  can  be  assigned  to  any 
object.  Nodes  that  have  a  tag  with  a  key  name  are  usually  called  Points  of  Inter¬ 
est  (POI)  and  usually  represent  the  position  of  some  object  or  structure  of  gen¬ 
eral  interest.  Keys  in  OSM  can  be  internationalised  to  accommodate  languages 
other  than  English,  which,  due  to  OSM’s  origins,  has  established  itself  as  the 
lingua  franca  of  the  project  (Ballatore  and  Mooney,  2015). 
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There  are  many  software  tools  available  to  automate  the  process  of  contrib¬ 
uting  data  or  editing  existing  data.  The  most  widely  used  and  popular  is  the 
JOSM  (Java  for  OSM)  tool3,  followed  by  the  Web-based  iD  editor4;  JOSM  is 
acknowledged  as  being  a  software  tool  more  suited  to  more  experienced  OSM 
contributors  while  the  iD  editor  is  very  straightforward  to  use  and  is  integrated 
into  the  OSM  map  homepage.  New  data  submitted  to  OSM  or  existing  data 
edited  within  the  OSM  database  are  available  for  access  almost  immediately, 
and  the  OSM  map  on  the  OSM  homepage  will  render  changes  quickly  (within 
30  minutes).  As  we  shall  discuss  in  Section  2,  there  are  many  ways  in  which 
one  can  access  and  download  OSM  data  for  other  uses.  On  a  more  technical 
level,  every  object  within  the  OSM  database  (nodes,  ways  or  relations)  has  sev¬ 
eral  data  attributes  including:  a  globally  unique  ID;  a  version  number,  which 
indicates  how  many  times  the  object  has  been  edited;  a  timestamp  of  the  most 
recent  edit;  and  the  user  ID  and  the  username  of  the  contributor  who  created 
(or  last  edited)  the  object. 

Anyone  can  sign  up  and  register  for  free  as  a  contributor  to  OSM.  In  July 
2016,  there  were  over  2.7M  registered  contributors,  as  outlined  on  the  OSM 
wiki5;  upon  sign-up,  a  contributor  can  begin  contributing  or  mapping  new 
data  in  OSM  or  editing  existing  data  stored  in  the  OSM  spatial  database.  How¬ 
ever,  it  is  not  easy  to  automatically  access  attribute  or  demographic  information 
about  these  user  contributors  from  the  OSM  database  or  associated  services. 
Several  researchers  (Neis  et  al.,  2013  and  references  therein)  have  attempted  to 
classify  and  understand  who  the  contributors  are  to  OSM  through  analysis  of 
their  editing  and  contribution  patterns  over  a  long  period  of  time. 

There  are  multiple  ways  users  can  contribute  data  to  OSM.  The  simplest  one 
is  through  the  digitisation  of  objects  (such  as  buildings,  roads  and  rivers)  that 
are  visible  on  openly  licensed  satellite  imagery.  The  most  used  imagery,  avail¬ 
able  by  default  in  the  OSM  iD  editor,  is  the  one  provided  under  a  compatible 
licence  by  Microsoft  (Coast,  2010).  While  this  way  of  contributing  data  allows 
volunteers  to  map  places  even  when  remote  from  the  mapped  place,  other 
instruments,  such  as  GPS  receivers  and  paper-based  tools  like  Field  Papers6, 
allow  users  to  physically  survey  an  area  and  then  upload  or  insert  the  informa¬ 
tion  into  the  OSM  database.  One  of  the  more  controversial  methods  of  contrib¬ 
uting  data  to  the  OSM  database  is  through  the  bulk  import  of  suitably  licensed 
geographic  data.  The  pros  and  cons  of  taking  a  geographic  dataset  produced 
outside  of  OSM  and  importing  it  into  the  OSM  database  have  been  discussed 
by  many  authors  (Zielstra  et  al,  2013),  and  the  issue  remains  a  contentious  one 
amongst  the  OSM  community.  One  of  the  most  powerful  arguments  against 
this  bulk  import  is  that  it  goes  against  the  very  ethos  of  OSM  that  data  be  col¬ 
lected  or  mapped  by  OSM  contributors  based  on  an  ability  to  verify  the  quality 
of  the  data,  ability  itself  founded  on  local  knowledge,  physical  collection  of  the 
data  or  geographic  expertise.  Many  examples  of  bulk  import  are  available  on 
the  OSM  wiki  website7,  with  the  TIGER  data  import  of  roads  and  highways 
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into  OSM  United  States  and  the  CORINE  LandCover  map  import  into  OSM 
France  amongst  the  most  well  known  and  controversial. 

The  remainder  of  this  chapter  is  organised  as  follows:  in  the  next  section,  we 
provide  an  overview  of  how  OSM  is  accessed,  visualised  and  used  in  research, 
software  development  and  other  applications.  In  the  final  section  of  the  chapter, 
we  provide  some  concluding  remarks  and  points  for  discussion  on  OSM;  we 
also  outline  where  the  reader  will  find  more  discussion  of  and  information  on 
OSM  in  the  proceeding  chapters  of  this  volume.  The  overall  purpose  of  this 
chapter  is  to  introduce  readers  unfamiliar  with  OSM  to  the  project  and  the 
types  of  applications  it  is  currently  used  for.  We  let  other  chapters  in  this  volume 
to  describe  specific  aspects  of  OSM  (data  quality,  visualisation  of  OSM,  motiva¬ 
tions  of  contributors,  etc.)  in  more  technical  detail. 


2  Applications  Using  OSM  Data 

In  the  introductory  section  of  this  chapter,  we  mentioned  that,  while  much 
of  the  focus  of  OSM  is  on  the  maps  and  cartographic  products  derived  from 
the  OSM  data,  the  core  product  of  OSM  is  the  spatial  database.  This  second 
section  will  provide  a  comprehensive  list  of  a  number  of  projects,  organisa¬ 
tions,  services,  software  and  applications  that  make  direct  use  of  OSM  data, 
with  references  and  links  provided  at  the  end  of  the  chapter.  A  number  of  such 
lists  and  descriptions  are  available  on  the  Internet  (e.g.  on  the  OSM  wiki8),  but, 
to  the  authors’  knowledge,  this  is  the  first  list  provided  in  an  academic  paper. 
Due  to  the  free  and  open  availability  of  OSM  data  and  the  increasing  popular¬ 
ity  of  OSM  worldwide,  it  would  be  impossible  to  list  all  of  the  existing  projects 
and  applications.  Making  use  of  OSM  data  has  become  so  easy  and  immediate 
that  new  tools  are  created  almost  every  day.  Some  of  these  applications  become 
very  popular  and  well  known  while  other  applications  are  limited  to  single 
languages  or  user  groups.  Therefore  we  limit  the  items  on  this  list  to  what  we 
consider  from  our  knowledge  of  OSM  to  be  the  most  popular,  up-to-date  and 
successful  applications  based  on  OSM  data.  The  description  of  each  item  on 
the  list  serves  as  a  reference  and  starting  point  for  readers  having  no  or  limited 
experience  in  OSM. 

We  understand  that  links  to  online  services  and  websites  change  over  time 
and  can  become  obsolete  or  broken.  However,  with  this  in  mind,  the  list  itself 
serves  as  a  commentary  on  the  diversity  of  application  areas  where  OSM  is 
used.  We  organise  the  list  under  the  following  headings:  Data  Download 
Applications  and  Services,  Education  and  Research  Use  of  OSM,  Disaster  and 
Humanitarian  OSM,  Government  and  Industry  Usage,  Visualisation  of  OSM 
Data,  Software  (OSM  Editors,  Routing  Services,  Vector  Rendering,  other  ser¬ 
vices),  Quality  Assurance  for  OSM,  and  Games  and  Leisure.  For  more  applica¬ 
tions  and  services,  a  very  extensive  list  is  maintained  on  the  OSM  wiki9. 
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2. 1  Data  Download  Applications  and  Services 

Regardless  of  the  types  of  applications  and  visualisations  that  can  be  produced 
with  OSM,  the  applications  and  services  that  provide  access  to  the  data  within 
the  OSM  database  are  arguably  the  most  important  part  of  the  OSM’s  data 
architecture.  Geofabrik  is  one  of  the  best  known  providers  of  access  to  OSM 
data  and  provides  access  to  continental-,  national-  and  regional-sized  data 
extracts10;  the  data  are  uploaded  very  frequently  (at  least  hourly)  and  are  pro¬ 
vided  in  a  number  of  different  formats.  The  OSM  wiki  provides  access  to  the 
so-called  Planet. osm  file11,  which  is  the  entire  OSM  database  contained  in  one 
very  large  XML  or  compressed  format  file.  This  file  is  updated  every  few  days. 
The  wiki  page  lists  many  mirror  servers  providing  access  to  the  Planet.osm 
file,  with  many  of  these  servers  providing  the  file  updated  on  an  hourly  basis. 
OSM  also  provides  an  API12  that  allows  extracting  and  saving  raw  data  from/to 
the  OSM  database.  There  are  API  calls  to  create,  read,  update  and  delete  map 
data  for  OSM,  and  this  provides  software  developers  and  applications  with 
the  most  up-to-date  data  available.  However,  queries  for  very  large  amounts 
of  data  (such  as  city-  or  country- sized)  are  discouraged  and  disallowed.  The 
Overpass  API  service13,  with  its  popular  frontend  Overpass  Turbo14,  is  a  read¬ 
only  API  that  allows  access  to  selected  parts  of  the  OSM  map  database;  clients 
send  queries  using  a  special  API  query  language  or  using  the  graphical  inter¬ 
face  provided  by  Overpass  Turbo.  The  Overpass  API  also  allows  programmatic 
calls  for  data  extracts  of  arbitrary  geographic  size.  The  commercial  company 
Mapzen  provides  OSM  data  for  download  in  city-  or  region-based  extract  sizes 
from  their  Metro  Extracts15  service:  a  number  of  data  formats  are  provided 
and  their  data  extracts  are  updated  on  a  weekly  basis.  A  simple  and  popular 
way  to  download  small  amounts  of  OSM  data  is  provided  on  the  OSM  home- 
page  and  consists  in  using  its  export’  feature16.  This  allows  users  to  browse 
the  OSM  map  and  select  small  regions  using  a  bounding  rectangle,  which  can 
then  download  OSM  data  to  the  calling  device.  All  of  the  services  mentioned 
so  far  provide,  as  standard,  OSM  data  in  the  default  OSM  XML  data  format17. 
As  most  types  of  XML,  OSM  XML  requires  special  software  tools  in  order  to 
be  processed,  and  there  are  many  options  available  for  this  task18.  Data  pro¬ 
viders  such  as  Geofabrik19  and  Mapzen20  also  provide  OSM  data  in  common 
formats,  such  as  SHP  files:  this  allows  users  to  process  and  visualise  the  data 
using  desktop  GIS  tools. 


2.2  Education  and  Research  Use  of  OSM 

The  ability  to  access  the  entire  OSM  spatial  database  on  an  hourly  basis  or  even 
more  frequently  has  proved  a  great  attraction  for  the  research  community  over 
the  past  number  of  years  (Jokar  Arsanjani  et  al.,  2015).  There  has  been  a  steady 
increase  year-on-year  of  the  number  of  papers  being  produced  by  the  academic 


A  Review  of  OpenStreetMap  Data  43 


community  in  the  domain  of  VGI,  and  OSM  forms  a  major  component  of  this 
work.  In  2015,  one  of  the  first  edited  volumes  on  OSM  as  a  research  topic  was 
published  (Jokar  Arsanjani  et  al.,  2015);  the  volume  considered  OSM’s  role  in 
GIScience  and  contained  a  very  wide  range  of  research  topics,  from  navigation 
and  routing  to  data  quality  and  visualisation.  Similarly,  two  EU  COST  Actions 
focused  on  VGI  that  ran  from  2012  to  2016,  TD1202  ‘Mapping  and  the  Citi¬ 
zen  Sensor’  (from  where  this  volume  comes)21  and  IC1203  ‘ENERGIC’22,  have 
produced  some  excellent  research  around  OSM.  In  other  educational  settings, 
a  repository  such  as  TeachOSM23  provides  a  set  of  community-  contributed 
resources  for  teachers,  trainers,  educators  and  instructors  who  want  to  bring 
OSM  into  their  classrooms.  The  classroom  can  be  a  very  important  setting  for 
educating  the  next  generation  of  OSM  mappers  or  contributors.  There  are  many 
examples,  including  ‘a  world-record  humanitarian  mapathon  that  took  place  at 
the  Politecnico  di  Milano  in  northern  Italy  in  March  20 16’24:  This  mapathon 
event  involved  over  two  hundred  children  from  six  elementary  schools  in  the 
Milan  province.  This  mapathon  resulted  in  the  mapping  of  over  5000  buildings 
in  Swaziland  (Ebrahim  et  al.,  2016).  More  information  can  also  be  found  in 
Chapter  5  of  this  book,  by  Fritz  et  al.  (2017). 


2.3  Disaster  and  Humanitarian  OSM 

OSM  data  and  mapping  has  been  used  extensively  in  recent  disaster  and 
humanitarian  emergencies  and  operations  all  over  the  world.  The  Humanitar¬ 
ian  OpenStreetMap  Team  (HOT)25  is  a  nonprofit  organisation  leading  the  inter¬ 
national  efforts  in  community  mapping  projects.  Through  its  open  source  Task¬ 
ing  Manager26,  HOT  coordinates  online  collaborative  mapping  based  on  OSM 
when  major  disaster  strikes  anywhere  in  the  world,  such  as  during  the  Nepal 
earthquake  in  2015  and  the  Japan  and  Ecuador  earthquakes  in  2016;  in  regions 
such  as  Nepal,  OSM  very  often  is  the  only  available  source  of  mapping  data  and 
cartography  that  rescuers  and  aid  agencies  can  use.  The  Missing  Maps  project27 
is  an  open,  collaborative  humanitarian  project  aiming  to  map  the  most  vulner¬ 
able  places  in  the  developing  world.  Missing  Maps  founders  and  members  are 
mainly  humanitarian  organisations  (e.g.  the  American  Red  Cross  and  Doctors 
Without  Borders)  and  NGOs;  the  project’s  volunteered  mapping  is  again  based 
on  OSM  data  and  the  HOT  Tasking  Manager.  The  University  of  Heidelberg 
hosts  the  disastermappers  project28,  which  aims  to  educate  and  train  university 
students  about  mapping  in  OSM  for  humanitarian  purposes.  Reaction  time  is 
often  very  quick  and  successful  with  OSM.  Examples  include  a  5-day  period 
of  mapping  where  the  Humanitarian  OSM  Team  and  volunteers  mapped  over 
100,000  buildings  and  hundreds  of  miles  of  roads  in  Guinea  when  Ebola  broke 
out  in  201429.  The  efforts  of  the  OSM  community  in  times  of  humanitarian  cri¬ 
sis  are  easy  to  visualise,  as  snapshots  of  OSM  data  can  be  extracted  to  show  the 
effects  of  mapping  before  and  after  a  particular  event.  HOT  shows  the  changes30 
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in  the  OSM  map  that  occurred  after  the  city  of  Tacloban  in  the  Philippines  was 
devastated  by  the  super  typhoon  Haiyan  in  2013. 


2.4  Government  and  Industry  Usage 

OSM  is  being  used  in  industry  and  by  government  agencies  around  the  world. 
Indeed  there  is  a  large  number  of  companies  listed  on  the  OSM  wiki31  who 
provide  consultancy  based  on  OSM  data.  This  consultancy  has  a  wide  range  of 
applications,  including  Web-based  mapping,  Web  GIS,  data  analysis,  routing 
and  navigation,  and  data  extraction.  There  are  several  leading  companies  in 
this  domain  including:  Mapbox32,  MapQuest33,  Stamen34,  Mapzen35,  CampTo- 
Camp36  and  Geofabrik18.  Most  of  these  companies  also  provide  OSM  services 
back  to  the  OSM  user  community,  including  OSM  data  extracts,  web-map  lay¬ 
ers  for  online  mapping  and  specialist  visualisation. 

Government  usage  of  OSM  is  more  difficult  to  track  unless  it  is  advertised 
and  highlighted  by  the  government  agencies  involved.  From  the  opposite  direc¬ 
tion,  there  has  been  significant  use  of  government  data  in  OSM,  with  several 
high-profile  data  imports  having  been  performed  over  the  years.  These  imports 
are  based  on  the  imported  data  having  an  acceptable  open  data  licence  allowing 
the  corresponding  geodata  to  be  inserted  into  the  OSM  database.  The  imports 
include:  the  TIGER  (the  Topologically  Integrated  Geographic  Encoding  and 
Referencing  system)  data,  produced  by  the  US  Census  Bureau,  in  the  USA; 
plan. at  in  Austria;  GeoBase  as  a  complete  map  of  Canada;  and  the  CORINE 
Land  Cover  map  in  France. 

In  2013,  New  York  City  opened  up  many  ‘high-value  datasets  to  the  pub¬ 
lic,  making  it  possible  to  use  these  data  to  improve  OSM’37,  facilitated  and 
assisted  by  Mapbox30.  ‘In  return.  New  York  City’s  GIS  team  is  informed  of 
changes  made  in  OSM  related  to  their  datasets,  which  helps  keep  their  map 
data  current.’  This  effectively  made  the  New  York  City  municipality  a  partici¬ 
pant  and  contributor  to  OSM  in  the  United  States.  MapGive38  is  an  initiative 
of  the  US  Department  of  State’s  Humanitarian  Information  Unit,  ‘mak[ing] 
it  easy  for  new  volunteers  to  learn  to  map  and  get  involved  in  online  tasks’. 
Portland’s  TriMet  traffic  authority  uses  OSM  to  power  their  multi-modal  traf¬ 
fic  planner39.  The  Gendarmerie  Nationale  (one  of  the  national  police  forces  in 
France)  uses  OSM  maps  inside  their  police  cars40.  The  CROWDGOV  report 
by  Haklay  et  al.  (2014)  has  a  number  of  examples  of  governmental  use  of 
OSM  around  the  world.  There  is  still  some  reluctance  by  government  agen¬ 
cies  to  use  VGI  and  OSM  as  a  complement  to  their  own  sources  of  spatial 
data  (Olteanu-Raimond  et  al.,  2017b);  however,  examples  do  exist,  such  as 
the  French  National  Address  Database  (BAN),  which  ‘associates  each  address 
listed  on  the  French  territory  (25  million  addresses)  with  its  geographic 
coordinates’  (the  database  ‘does  not  contain  any  nominative  data).  BAN  is 
the  result  of  ‘an  innovative  collaboration  model  between  public  authorities’ 
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in  France  and  OSM  France  ‘to  build  an  essential  reference  for  the  economy, 
society  and  public  services’41. 


2.5  Visualisation  of  OSM  Data 

From  anecdotal  evidence,  visualisation  of  OSM  data  is  certainly  one  of  the 
most  popular  applications  of  OSM  data.  Visualisation  of  OSM  data  is  facili¬ 
tated  by  the  flexible  availability  of  the  OSM  data  (see  Section  2.1)  and  the  very 
wide  range  of  visualisation  tools  available,  which  can  natively  process  OSM 
data  directly  or  from  a  spatial  database.  There  is  a  vast  number  of  examples,  and 
we  provide  a  small  selection  here  for  the  purposes  of  illustrating  the  breadth  of 
applications. 

OpenTopoMap42  provides  a  topographic  visualisation  of  OSM  data  com¬ 
bined  with  SRTM  elevation  data.  The  map  tiles  in  OpenTopoMap  are  avail¬ 
able  for  use  as  a  web-map  layer  in  other  applications.  OpenCycleMap43  is  an 
OSM  rendering  ‘primarily  aimed  at  showing  information  useful  to  cyclists’.  The 
OpenCycleMap  global  cycling  map  is  based  on  data  from  OSM  and  is  updated 
frequently.  The  OpenCycleMap  website  indicates  that  ‘at  low  zoom  levels,  it  is 
intended  for  overviews  of  national  cycling  networks;  at  higher  zoom  levels,  it 
should  help  with  planning  which  streets  to  cycle  on,  where  cyclists  can  park 
their  bikes,  etc.’  It  is  also  available  for  use  as  a  web-map  layer  in  other  applica¬ 
tions.  In  a  similar  fashion,  the  Hike  &  Bike  Map44  visualisation  of  OSM  data 
highlights  hiking  and  biking  routes  by  using  a  specific  cartographic  style  to 
highlight  these  routes.  The  OpenSnowMap45  is  an  OSM-based  map  rendering 
of  ski  slopes  and  lifts.  It  integrates  OSM  data,  MODIS/Terra  Snow  Cover  8-Day 
Global  data46  and  SRTM  90m  Digital  Elevation  data.  As  of  December  2016, 
over  100,000  km  of  skiing  trails  have  already  been  mapped.  OsmHydrant47  is  a 
special  map  showing  the  position  of  hydrants,  water  tanks  and  suction  points, 
with  the  purpose  of  assisting  local  authorities  and  fire  departments.  While 
there  is  an  emphasis  on  visualisation,  it  allows  OSM  contributors  to  map  new 
hydrants  and  edit  the  existing  ones.  As  of  July  2016,  almost  45000  hydrants  had 
been  added.  OpenFireMap48  is  an  OSM  rendering,  highlighting  ‘fire  stations, 
hydrants,  water  tanks,  and  ponds  used  for  firefighting  (suction  points)’.  It  does 
not  provide  editing  facilities  directly.  The  Stamen  company  in  the  United  States 
provides  several  cartographic  variations  on  the  standard  OSM  map  representa¬ 
tions.  These  are  available  for  use  as  web-map  layers  in  other  applications.  Three 
of  the  most  popular  web-maps  provided  by  Stamen  are  the  terrain  represen¬ 
tation49,  the  black  and  white  representation50  and  the  very  artistic  watercolor 
representation51.  There  is  also  a  good  deal  of  visualisation  of  OSM  in  3D:  one 
of  the  best  examples  is  the  OSM  Buildings52  JavaScript  library  for  visualising 
OpenStreetMap  building  geometry  on  2D  and  3D  maps.  F4map53  is  a  French 
company  providing  cartography  and  visualisation  services:  one  of  its  products 
is  a  3D  visualisation  of  the  world  using  OSM  data.  In  other  types  of  visualisa- 
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tion,  Kothic  JS54  is  an  in-development  new  technology  that  renders  OSM  data 
‘on  the  fly’  using  HTML5  without  the  need  for  raster  tile  images.  Mapbox  Stu¬ 
dio55  is  a  suite  of  free  and  paid-for  tools  to  produce  ‘vector  tiles’,  which  can  be 
rendered  either  server-side  or  client-side,  with  many  different  customisations 
available  according  to  the  OSM  data  being  used. 


2.6  OSM-based  Software 

As  mentioned  above,  the  OSM  community  has  created  a  vast  ecosystem  of  soft¬ 
ware  tools  and  services.  As  is  the  case  with  the  visualisation  of  OSM  data,  it  is 
not  possible  to  give  an  in-depth  list  of  software.  We  have  organised  this  sec¬ 
tion  into  three  subsections:  OSM  data  editors,  OSM-based  routing  services  and 
other  services. 


2.6.1  OSM  Data  Editors 

OSM  is  an  openly  accessible  spatial  database  which  any  contributor  can  supply 
geodata  to  and  whose  existing  data  any  contributor  can  also  edit.  It  is  therefore 
very  important  that  software  tools  be  available  to  support  this  editing  work 
for  contributors.  The  OSM  wiki  contains  an  extensive  list  of  OSM  data  editing 
tools56  and  a  comparison  of  their  characteristics.  In  this  section  we  outline  five 
of  the  most  famous  and  well  known  OSM  editors.  The  iD  editor57  is  a  Web-based 
editor  for  OSM  and  is  the  editor  that  is  integrated  into  the  OSM  homepage.  The 
JOSM  editor3  is  a  Java  editor  for  OSM  and  is  considered  an  editor  for  skilled 
OSM  contributors.  It  ‘supports  loading  GPX  tracks,  background  imagery  and 
OSM  data  from  local  sources  as  well  as  from  online  sources  and  allows’  direct 
editing  of  the  OSM  data;  a  number  of  plugins  provide  other  advanced  func¬ 
tions.  Potlatch58  is  a  flash-based  web  editor  for  OSM.  Vespucci59  is  the  first 
OSM  editor  specifically  developed  for  small  and  large  Android-based  devices; 
it  provides  a  reasonably  extensive  set  of  editing  functionalities,  which  makes  it 
usable  on  the  field  by  novice  and  experienced  OSM  contributors.  Merkaartor60 
is  a  desktop-based  software  editor  for  OSM  that  is  available  for  installation  and 
use  on  most  operating  systems;  similarly  to  JOSM  and  Vespucci,  Merkaartor 
provides  a  wide  range  of  functionalities. 


2.6.2  OSM-based  Routing  Services 

OSM-based  routing  services  are  software-based  solutions  that  use  the  data 
in  the  OSM  database  for  the  purposes  of  generating  routing  and  navigation 
solutions.  Routing  and  navigation  is  possible  when  objects  in  OSM  have 
attributes  (tags)  that  are  helpful  in  solving  these  problems.  The  ability  to 
apply  attributes  from  different  thematic  areas  on  the  same  object  (such  as 
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a  road  or  a  street)  means  that  different  routing  applications  can  be  easily 
developed. 

The  Open  Source  Routing  Machine  (OSRM)61  is  a  C++  routing  engine  for 
finding  ‘shortest  paths  in  road  networks’.  It  supports  car,  bicycle  and  walk  modes 
and  is  ‘easily  customized  through  profiles’.  GraphHopper62  is  a  company  based 
in  Germany  focused  on  delivering  the  ‘fastest  possible  routing  algorithms’  and 
‘privacy  protection’  using  open  source  software  for  their  customers.  Their  open 
source  routing  library  and  server  includes  elevation  data  and  allows  routing 
for  several  difficult  vehicle  types.  The  MapQuest  Directions  API63  is  offered 
by  the  US  company  MapQuest  and  calculates  ‘point-to-point,  multipoint,  and 
optimized  routes’.  The  API  can  be  used  by  any  application,  and  the  directions 
are  based  on  OSM  data.  OpenRouteService64  is  a  routing  service  developed  by 
the  GIScience  Research  Group  at  Heidelberg  University  (Germany);  it  provides 
routing  capabilities  for  different  categories  (including  wheelchairs  users),  fea¬ 
tures  an  advanced  graphic  interface  and  is  also  available  in  a  mobile  version. 
Kurviger65  is  a  specialised  routing  service  for  motorcyclists,  which  computes 
optimal  paths  considering  the  topography  of  the  terrain.  It  is  only  available  in 
German.  Cruiser  for  Android66  is  an  Android-based  mapping  and  navigation 
application.  Wheelmap.org67  is  an  open  and  free  online  map  of  wheelchair- 
accessible  places.  While  it  is  not  actually  a  routing  application  per  se,  it  provides 
information  on  the  wheelchair- accessibility  of  public  places,  which  is  very  use¬ 
ful  for  wheelchair  users,  by  allowing  contributors  to  directly  edit  OSM  to  pro¬ 
vide  accessibility  information.  ViaMichelin68  is  a  ‘wholly  owned  subsidiary  of 
the  Michelin  Group’69;  it  ‘designs,  develops  and  markets  digital  travel  assistance 
products  and  services  for  road  users  in  Europe’,  and  the  German  version  of 
their  route  planner  uses  an  OSM  Outdoor  Layer  visualisation70.  INRIX  Traffic71 
is  a  commercial  product  for  navigation  and  traffic  information  that  uses  OSM 
data;  the  application  learns  the  preferences  and  daily  routines  of  the  user,  and, 
based  on  the  learned  activities,  makes  a  daily  personalised  itinerary  with  the 
anticipated  tours  and  frequently  used  routes. 


2.6.3  Other  Services 

In  this  section,  we  provide  some  links  to  other  services  that  use  OSM  but  do  not 
necessarily  fit  neatly  inside  our  classifications.  In  OSM,  nodes  that  have  spe¬ 
cific  tags  are  often  called  POI  amongst  contributors  and  users  of  OSM.  There 
is  no  absolute  set  of  tags  that  qualify  as  indicating  a  POI,  but  usually  a  POI  will 
have  tags  related  to  amenities,  such  as  buildings,  shopping,  education  or  build¬ 
ings  with  cultural  and  historical  significance.  The  OpenPoiMap72  provides  a 
map-based  visualisation  of  all  POI  in  OSM  for  any  part  of  the  world:  POI  are 
presented  as  individual  layers,  which  can  be  turned  on  or  off,  and,  based  on 
what  visualisation  information  the  map  provides,  contributors  can  then  edit 
the  POI  data  directly  in  OSM  using  the  links  provided  on  the  interface.  The 
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Places!  service73  attempts  to  present  a  visualisation  of  the  analysis  of  patterns 
in  place  names  within  given  countries  based  on  the  OSM  database  for  those 
countries.  For  example,  Places!  tries  to  find  patterns  in  the  spatial  distribution 
of  places  in  Switzerland  containing  the  term  ‘berg’  or  places  in  the  United  King¬ 
dom  containing  the  term  ‘hill’  in  their  name.  The  analysis  is  performed  offline 
and  updated  regularly. 

The  OSM  Analytics74  application  recently  launched  by  HOT  provides  inter¬ 
active  functionality  to  analyse  how  specific  OSM  features  are  mapped  in  a  spe¬ 
cific  region.  This  tool  allows  the  user  to  select  the  geographic  region  of  interest 
and  shows  a  graph  of  the  mapping  activity  in  that  region.  It  is  possible  to  select 
a  specific  time  interval  to  view  the  number  of  newly  mapped  or  edited  features 
in  that  period;  the  map  will  highlight  the  matching  buildings,  as  related  to  this 
time  interval.  This  tool  is  a  very  useful  way  to  obtain  a  high-level  view  of  how 
OSM  developed  in  a  particular  region.  Finally,  the  Show-Me-The-Way  applica¬ 
tion75  is  an  interactive  web  application  that  displays  near  real-time  edits  per¬ 
formed  by  contributors  to  OSM.  The  application  loads  recent  edits  and  displays 
them  by  jumping  to  the  particular  region  where  the  edit  was  made.  This  type 
of  visualisation  is  possible  owing  to  the  fact  that  very  recent  edits  submitted  to 
OSM  by  contributors  are  immediately  available  for  access  by  anyone  who  con¬ 
nects  to  the  OSM  API  or  other  services  listed  in  Section  2.6. 


2.7  Quality  Assurance  for  OSM 

The  quality  of  OSM  data  is  under  constant  scrutiny  by  the  scientific  commu¬ 
nity.  The  quality  of  data  in  OSM  is  one  of  the  major  concerns  that  industry  and 
authoritative  agencies  such  as  National  Mapping  Agencies  (NMAs),  Land  and 
Cadastral  Agencies  and  other  types  of  government  agencies  have  about  OSM 
(Olteanu-Raimond  et  al.,  2017b).  In  practice,  there  is  no  single  set  of  metrics 
or  criteria  against  which  OSM  can  be  measured  that  will  satisfy  all  users  for  the 
myriad  of  possible  end  applications.  The  quality  of  the  OSM  data  and  suitability 
for  a  particular  application,  purpose  or  use  case  is  very  much  dependent  on  the 
characteristics  of  the  problem  being  tackled.  The  OSM  community  recognises 
the  importance  of  data  quality,  and  a  very  wide  range  of  tools  and  applications 
have  been  developed  to  tackle  this  issue.  In  this  section,  we  provide  some  intro¬ 
duction  to  a  small  number  of  these.  A  comprehensive  list  is  maintained  on  the 
OSM  wiki76. 

BBBike  and  Geofabrik  deliver  the  OSM  Map  Compare  tool77,  which  allows 
visual  comparison  of  OSM  map  layers  with  other  popular  mapping  systems 
such  as  Google,  Bing,  HERE,  ESRI,  etc.  The  web  map  interface  allows  users  to 
visually  compare  any  region  in  OSM  with  the  corresponding  mapping  in  the 
other  popular  systems.  IGN  France  (French  National  Institute  of  Geographic 
and  Forest  Information)  provides  a  very  similar  system  to  Map  Compare  with 
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their  Ma  Visionneuse78  application,  which  allows  OSM  to  be  compared  with 
IGN  layers,  amongst  others;  this  is  particularly  useful  for  comparison  between 
French  web  map  layers.  The  OSM  Inspector79,  also  by  Geofabrik,  provides  an 
overlay  of  potential  errors  or  data  quality  problems  onto  an  OSM  map.  These 
problems  include:  very  long  ways  (polylines);  self-intersecting  ways,  polygons 
or  polylines,  which  are  represented  by  only  one  node;  and  polygons  or  pol¬ 
ylines  that  have  duplicate  nodes  contained  within  them. 

Taginfo2  is  a  very  popular  Web-based  application  that  displays  up-to-date 
statistics  about  the  tags  used  in  the  OSM  database,  e.g.  which  tags  are  used,  how 
many  times  they  are  used,  where  a  certain  tag  occurs,  etc.  Taginfo  is  particu¬ 
larly  useful  for  finding  problems  with  the  keys  or  values  in  tags,  the  popularity 
of  tags,  where  specific  tags  are  used  and  which  other  tags  are  used  in  combina¬ 
tion  with  them.  The  use  of  taginfo  to  find  problems  with  tagging  relates  to  its 
very  comprehensive  listing  of  the  ranking  of  popularity/ application  of  values  to 
specific  keys  in  tags.  This  can  quickly  allow  an  OSM  expert  to  identify  instances 
of  an  incorrect  assignment  of  values  in  tags  that  has  an  overall  effect  on  tag 
data  quality.  Taginfo  does  not  provide  any  information  on  errors  relating  to 
geometry  or  topology.  Osmose80,  an  acronym  for  OpenStreetMap  Oversight 
Search  Engine,  is  a  quality  assurance  tool  available  to  detect  issues  in  OSM  data; 
it  is  also  useful  for  integrating  third-party  datasets.  It  tries  to  detect  anomalies 
in  the  data  and  then  display  them  on  an  OSM  map,  from  which  contributors 
can  fix  or  update  them.  Keep  Right81  is  one  of  the  oldest  quality  assurance  tools 
in  OSM.  It  displays  automatically  detected  errors  on  the  OSM  map  or  in  a  list 
format,  and  it  detects  a  very  wide  set  of  error  types,  including  geometry  errors, 
topological  errors,  attribution  errors  and  other  general  OSM  errors. 

MapRoulette82  is  a  Web-based  application  that  proposes  challenges  to  fix 
errors  in  OSM.  Each  challenge  represents  a  set  of  tasks,  and  OSM  contributors 
can  fix  the  errors  by  performing  edits  in  OSM  in  the  usual  way.  The  challenges 
vary  in  difficulty,  allowing  contributors  to  choose  the  types  of  errors  that  they 
feel  confident  about  fixing.  The  fixing  is  very  heavily  focused  on  the  contribu¬ 
tors’  interpretation  of  information  from  aerial  imagery.  DeepOSM83  attempts 
to  detect  problems  in  OSM  road  networks  using  neural  networks.  The  system 
downloads  satellite  imagery  and  the  corresponding  OSM  data  that  show  roads/ 
features  for  that  area.  This  allows  DeepOSM  to  generate  training  and  evalu¬ 
ation  data  for  the  neural  networks,  which  then  calculate  predictions  of  mis- 
registered  roads  in  OSM. 

The  Grass&Green  project  (Ali  et  al,  2016)  asks  OSM  contributors  to  cor¬ 
rect  tagging  or  classification  of  land  use  features  involving  grass  or  green  areas. 
This  application  provides  a  two-screen  interface,  where  an  OSM  feature  is 
highlighted  on  the  standard  OSM  web-map  layer  and  in  aerial  imagery.  The 
user  (who  needs  to  have  an  OSM  account)  must  then  provide  an  appropriate 
classification  for  this  entity  by  choosing  what  he/she  believes  is  correct  from 
the  list  of  classifications:  grass,  park,  garden,  forest  and  meadow.  The  JOSM 
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Validator84  ‘is  a  core  feature  of  JOSM  which  checks  and  fixes  invalid  data  that 
have  been  contributed  to  OSM  or  are  being  contributed  for  the  first  time.  The 
validator  checks  and  fixes  a  wide  variety  of  problems,  including  topological 
errors,  unclosed  polygons  and  overlapping  areas. 

Academic  research  has  produced  a  wide  range  of  quality  assessment  and 
comparison  tools  for  OSM  (Ostermann  and  Granell,  2017).  One  of  the  most 
recently  published  is  that  of  Brovelli  et  al.  (2017):  this  open  source  software  tool 
provides  an  automated  comparison  of  street  network  data  in  OSM  with  that  in 
an  authoritative  dataset.  Users  of  the  tool  must  provide  the  authoritative  dataset 
for  comparison. 


2.8  Games,  Leisure  and  General  Public  Information 

In  this  final  section  of  applications  for  OSM,  we  describe  a  mixture  of  appli¬ 
cations  that  use  OSM  for  the  purposes  of  games,  leisure  or  general  public 
information. 

‘Collapse  -  The  Division  Game’85  is  a  simulation  game  based  on  open  data¬ 
sets  (including  OSM  data),  created  by  Ubisoff  to  introduce  the  environment 
upon  which  the  new  online  action  game  ‘TomClancy’s  The  Division  (for 
Windows,  Playstation  and  Xbox)86  is  based.  The  user  is  the  first  person  in  the 
world  infected  with  a  virus,  and  the  game  realistically  simulates  the  diffusion 
of  the  virus  until  the  collapse  of  society;  OSM  data  relating  to  health  facili¬ 
ties,  societal  infrastructure  and  transportation  are  used  in  the  simulation.  The 
OSM  game  Kort87  is  very  similar  to  Map  Roulette79,  with  the  exception  that  Kort 
drives  a  gamification  approach  to  OSM  error  fixing.  Kort  was  developed  for 
usage  mainly  on  mobile  devices  but  also  works  well  on  most  browsers.  For  both 
solving  tasks  and  checking  existing  solutions,  points  (so-called  Koins)  can  be 
earned.  The  goal  is  to  continually  rise  through  the  ranks  of  the  high-score  list. 
Additionally,  players  are  also  awarded  medals  for  their  efforts.  At  the  time  of 
writing,  there  are  over  2,000  active  players  having  solved  almost  50,000  tasks. 
The  solutions  to  tasks  must  be  evaluated  and  accepted  by  other  users  before 
they  are  submitted  to  the  OSM  database. 

In  a  YouTube  video88,  an  OSM  contributor  provides  a  video-based  visualisa¬ 
tion  of  the  contribution  of  nodes  to  OSM  over  the  period  2004-2016.  Nodes 
in  OSM  that  have  had  more  editing  activity  on  them  are  coloured  using  a  heat- 
map  approach.  This  timelapse  video  and  many  others  listed  on  the  OSM  wiki89 
provide  a  very  good  high-level  overview  of  how  OSM  has  developed  since  its 
inception.  The  node  density  map  by  tyrasd90  provides  a  static  visual  overview 
of  how  many  nodes  are  mapped  within  any  OSM  region.  Lukas  Martinelli91 
produced  a  Global  Noise  Pollution  map  based  on  the  urban  infrastructure 
data  in  OSM  for  cities  and  urban  areas.  GoodCityLife  is  a  group  of  freelance 
researchers  in  urban  dynamics  who  use  OSM  to  produce  visualisations.  One 
such  visualisation  is  their  Smelly  Maps92,  which  uses  the  underlying  OSM  data 
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for  a  city  or  region  to  calculate  if  there  is  likely  to  be  nasty  odours  or  smells  in 
a  locality.  Bahnhof.de93  is  the  website  providing  information  about  railway  sta¬ 
tions  in  Germany;  OSM  is  used  as  the  base  layer  for  the  mapping  on  this  infor¬ 
mation  website.  The  flight  simulation  software  World2XPlane  by  X-Plane94'95 
is  also  worth  mentioning;  this  software  takes  OSM  data  and  converts  the  data 
into  scenery  for  X-Plane.  It  uses  as  much  information  as  possible  to  generate 
highly  realistic  scenery. 


3  Conclusions  and  Discussion 

In  this  chapter,  we  have  provided  an  overview  of  the  OSM  project.  As  men¬ 
tioned  in  the  introduction,  OSM  is  probably  the  most  famous  example  of  VGI 
on  the  Internet  today.  Even  at  the  time  of  writing  (during  the  summer  of  2016), 
the  project  continued  to  grow  and  expand,  with  over  2.7M  registered  contribu¬ 
tors/users  and  almost  3.4B  nodes  of  data,  which  made  up  almost  350M  poly¬ 
gons  and  polylines.  Around  37,000  contributors  are  active  in  OSM  during  a 
typical  month.  OSM  can  certainly  claim  to  be  the  largest  freely  and  openly 
accessible  database  of  geographic  data  in  the  world.  Indeed  its  rate  of  growth 
in  terms  of  geographic  data  and  frequency  of  contributions  and  editing  brings 
OSM  into  the  realm  of  geographic  big  data  (Leonelli,  2014).  When  one  consid¬ 
ers  the  extended  OSM  ecosystem  of  open  source  software,  data  download  ser¬ 
vices,  data  visualisation  services,  wiki  help  systems,  mailing  lists  and  forums, 
OSM  serves  as  a  very  suitable  starting  point  for  any  discussion  on  VGI.  Indeed 
one  could  speculate  on  how  VGI  would  have  developed  if  OSM  had  been  absent 
from  this  space.  This  chapter  has  attempted  to  give  the  reader  who  is  new  to 
OSM  an  introduction  to  the  OSM  ecosystem  while  providing  the  reader  famil¬ 
iar  with  OSM  an  overview  of  where  OSM  currently  stands  in  the  world  of  VGI. 

In  the  remaining  chapters  of  this  book,  OSM  will  be  mentioned  and  dis¬ 
cussed  in  many  different  ways.  In  Chapter  4,  Touya  et  al.  (2017)  address  the 
challenges  of  automated  mapmaking  using  VGI  as  the  input  data,  and  the 
authors  consider  OSM  as  a  key  source,  but  not  the  only  source,  of  this  VGI  data. 
Chapter  2,  See  et  al.  (2017)  has  already  indicated  that  there  are  many  sources  of 
VGI  available  today.  While  OSM  is  open  data  and  is  licensed  under  the  Open 
Data  Commons  Open  Database  License  (ODbL),  there  are  privacy  and  ethical 
issues  around  the  reuse  of  OSM  data.  In  OSM,  one  is  free  to  copy,  distribute, 
transmit  and  adapt  OSM  data,  as  long  as  credit  is  provided  to  OSM  and  its  con¬ 
tributors.  If  one  alters  or  builds  upon  the  data,  then  the  resultant  data  must  also 
be  distributed  under  the  same  licence.  Chapter  6  tackles  some  of  these  issues 
for  OSM  and  VGI  in  general  (Mooney  et  al.,  2017).  In  Chapter  8,  Antoniou 
and  Skopeliti  (2017)  consider  how  the  concept  of  quality  has  evolved  in  OSM 
over  time  through  the  analysis  of  the  evolution  of  OSM  data  specifications  and 
of  OSM  editors.  The  very  evolution  and  changes  over  time  to  the  OSM  ecosys¬ 
tem  can  influence  the  quality  of  OSM  data.  Related  to  this  theme,  Chapter  9, 
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by  Skopeliti  et  al.  (2017),  considers  how  quality  in  VGI  can  be  visualised  and 
communicated  effectively,  with  significant  research  work  having  already  been 
carried  out  on  this  topic  using  OSM  as  the  case-study.  As  discussed  earlier  in 
this  chapter,  OSM  has  a  very  flexible  and  easy-to-understand  approach  to  the 
contribution  of  new  geographic  data  or  editing  of  existing  data  in  the  OSM 
database.  Chapter  10  considers  best  practices  for  VGI  data  collection,  and  Min- 
ghini  et  al.  (2017)  propose  in  that  chapter  that  the  lack  of  protocols  and  the 
flexibility  of  contribution  is  not  necessarily  a  good  thing  in  terms  of  produc¬ 
ing  consistently  high-quality  VGI  data.  Chapter  1 1  (Bastin  et  al,  2017)  consid¬ 
ers  VGI  data  management  and  suggests  ways  in  which  OSM  can  be  integrated 
into  the  so-called  Semantic  Web,  where  all  OSM’s  data  would  be  converted 
to  Linked  Data.  Finally,  Chapter  13  (Olteanu-Raimond  et  al.,  2017a)  discusses 
VGI  and  the  role  of  NMAs,  with  OSM  often  seen  as  a  rival  or  competitor  to  the 
geographic  data  services  provided  by  these  agencies.  As  is  obvious  from  this 
overview  of  the  remaining  chapters  of  the  book,  a  deep  scientific  discussion  of 
VGI  is  impossible  without  reflecting  on  and  considering  the  impact  and  influ¬ 
ence  of  OSM.  This  is  certainly  very  likely  to  continue  for  many  years  to  come. 


3.1  The  Future  of  OSM 

OSM’s  greatest  strength  will  always  be  its  huge  pool  of  contributors.  Thousands 
of  these  contributors  have  collected  and  generated  some  of  the  world’s  best 
street  and  topographic  data  without  expensive  teams  of  professional  surveyors 
or  world-class  equipment.  As  the  world  and  the  urban  and  natural  environment 
change  every  day,  OSM  contributors  have  the  ability  to  depict  this  changing 
world  in  a  map  and  a  database  that  belong  to  them.  OSM  may  not  yet  have  the 
advanced  types  of  features  that  Google  Maps  has  -  street-view  images,  multi¬ 
modal  navigation,  social  recommendations,  etc.  -  but  it  may  soon  have.  Mapil¬ 
lary96'97,  which  is  a  service  for  crowdsourcing  street-level  photographs  using 
smartphones  and  computer  vision,  has  almost  70  million  geotagged  street-level 
photographs  at  the  time  of  writing.  Mapillary  shares  the  open  data  ethos  of 
OSM  and  they  can  work  well  together  (Juhasz  and  Hochmair,  2016).  Very  simi¬ 
larly,  efforts  are  in  place  to  link  OSM  elements  with  their  corresponding  Wiki¬ 
pedia  pages  and  Wikidata  items.  As  an  example,  the  WTOSM98  (Wikipedia  To 
OSM)  service  developed  by  the  Italian  OSM  community  automatically  identi¬ 
fies  Wikipedia  pages  that  can  be  linked  (by  means  of  tags)  to  OSM  elements. 
Mature  services  such  as  OpenRouteService  provide  navigation  services  based 
wholly  on  OSM’s  database.  One  of  the  factors  in  the  evolution  of  OSM  over 
the  past  decade  or  so  has  been  the  ability  of  the  project  to  adapt  and  expand  in 
the  face  of  technological  advancements  in  other  areas  of  ICT  and  Open  Source 
Software.  Web  service  access  to  the  OSM  database  or  its  mirrors  has  improved 
and  is  very  stable,  allowing  developers  to  build  an  array  of  applications  using 
the  data  directly  from  the  database. 


A  Review  of  OpenStreetMap  Data  53 


There  are  some  challenges  for  OSM  going  forward.  These  challenges  are 
a  mixture  of  factors  based  on  the  social  and  technological  aspects  of  VGI 
(Mooney,  2015).  Contributors  can  make  edits  to  the  OSM  global  database  with¬ 
out  any  real  controls  or  moderation  at  the  point  of  contribution.  Despite  the 
fact  that  there  are  many  applications  available  for  an  a  posteriori  quality  check 
(see  Section  2.7),  as  long  as  edits  can  be  made  without  initial  controls  the  issue 
of  OSM  data  quality  will  remain  a  contentious  one.  Relatively  unknown  con¬ 
tributors  from  an  unknown  crowd  supplying  geospatial  data  is  a  concern  to  end 
users  and  stakeholders  such  as  NMAs,  government  agencies  and  commercial 
companies.  There  have  been  many  instances  in  the  past  where  large  amounts 
of  OSM  data  have  been  deleted  by  new  or  inexperienced  contributors.  Some 
authors  have  considered  the  problem  of  automated  detection  of  instances  of 
vandalism  and  of  the  purposeful  deletion  of  data  in  OSM  (Neis  et  al.,  2012). 
Many  local  OSM  communities  have  long  debated  the  wish  and  need  to  imple¬ 
ment  tools  for  checking  and  approving  contributions  (e.g.  by  more  experienced 
contributors  or  by  the  community  itself).  However,  such  an  implementation 
would  be  clearly  against  the  very  same  nature  of  the  OSM  project,  and  no  for¬ 
mal  actions  are  yet  in  place  in  this  regard. 

Several  academic  studies  have  shown  that  for  specific  regions  of  the  world, 
OSM  has  reached  a  very  high  and  mature  level  of  completeness  and  spatial 
accuracy  compared  to  data  from  sources  such  as  NMAs  (Dorn  et  al.,  2015). 
One  of  the  major  challenges  will  be  to  sustain  the  contributor  motivation  for 
editing  and  maintaining  the  OSM  database  into  the  future  (Budhathoki  and 
Haythornthwaite,  2012).  Every  day  sees  less  white  space  or  empty  places  on 
the  OSM  map.  Similar  scenarios  are  being  observed  in  Wikipedia  (Jankowski- 
Lorek  et  al.,  2016).  The  task  of  being  an  OSM  contributor  is  changing  from  that 
of  being  the  contributor  of  brand  new  geodata  to  OSM  to  that  of  map  garden¬ 
ing  (McConchie,  2016;  Sinton,  2016);  in  this  latter  case,  contributors  are  not 
necessarily  involved  in  contributing  new  material  to  OSM  but  are  attending  to 
the  upkeep  and  update  of  the  existing  geometry  and  attribute  data  (tags)  in  the 
database. 

As  geolocation  is  further  embedded  into  social  media,  user-generated  con¬ 
tent  on  the  Internet,  etc.,  issues  of  privacy  and  ethics  can  be  raised  (Blatt,  2015), 
and  the  work  outlined  in  Chapter  6  of  this  book  (Mooney  et  al.,  2017),  high¬ 
lighting  these  problems  in  relation  to  VGI,  will  become  critical;  currently,  very 
little  work  has  been  undertaken  by  the  research  community  into  privacy  and 
ethics  in  VGI.  In  the  final  chapter  of  one  of  the  first  edited  volumes  dedicated 
to  OSM,  Mooney  (2015)  advises  that  the  academic  community  has  a  significant 
role  to  play  in  the  future  of  OSM;  through  scientific  research  and  investigation, 
the  academic  community  is  encouraged  to  feed  its  results  and  experiences  back 
directly  into  the  OSM  community  and  become  more  closely  involved  in  the 
day-to-day  workings  of  the  OSM  ecosystem.  This  model  has  been  very  success¬ 
ful  in  the  open  source  software  community,  and  this  can  extend  to  the  OSM 
world. 
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Abstract 

The  most  common  way  to  use  geographic  information  is  to  make  maps.  With 
the  ever  growing  amount  of  Volunteered  Geographic  Information  (VGI),  we 
have  the  opportunity  to  make  many  maps,  but  only  automatic  cartography 
(generalisation,  stylisation,  text  placement)  can  handle  such  an  amount  of  data 
with  very  frequent  updates.  This  chapter  reviews  the  recent  proposals  to  adapt 
the  current  techniques  for  automatic  cartography  to  VGI  as  the  source  data, 
focusing  on  the  production  of  topographic  base  maps.  The  review  includes 
methods  to  assess  quality  and  the  level  of  detail,  which  is  necessary  to  handle 
data  heterogeneity.  The  paper  also  describes  automatic  techniques  to  general¬ 
ise,  harmonise  and  render  VGI. 
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1  Introduction 

Maps  are  now  everywhere,  from  the  Web  to  smartphones,  and  are  no  longer 
limited  to  paper  maps  for  hiking  or  routing.  But  most  of  the  maps  provided 
to  the  general  public  are  not  good  maps,  so  they  are  not  as  effective  as  they 
could  be.  Whether  they  are  static  or  dynamic  (i.e.  pan  and  zoom  allowed),  on 
paper  or  on  screens  of  variable  sizes,  good  maps  are  maps  where  every  feature 
is  legible,  and  where  the  user  can  easily  understand  the  geography  behind  the 
map  and  the  message  of  the  map.  Making  good  maps  manually  requires  car¬ 
tographic  skills.  However,  when  the  amount  of  data  is  huge,  for  instance  with 
the  world  OpenStreetMap  (OSM)  dataset,  mapmaking  has  to  be  automated. 
Automating  mapmaking  entails  two  steps  to  obtain  a  legible  topographic  map 
out  of  a  geographic  database:  selecting  the  data  and  the  styles  to  be  used  to 
portray  them,  and  refining  the  content  in  order  to  reach  a  legible  map,  which 
is  complex  when  scale  decreases,  as  the  space  in  which  to  put  the  map  symbols 
and  the  text  reduces.  These  steps  require  the  automation  of  three  main  pro¬ 
cesses:  map  generalisation  (the  simplification  and  abstraction  of  map  objects 
when  scale  decreases),  text  placement,  and  cartographic  symbolisation  or  styli- 
sation.  How  to  optimally  automate  such  processes  is  still  a  research  question, 
but,  in  recent  years,  maps  have  been  more  and  more  often  produced  through 
complete  or  partial  automation.  The  traditional  actors  of  automated  mapmak¬ 
ing  are  the  national  or  regional  mapping  agencies,  the  private  map  editors  and 
the  GIS  software  vendors.  These  actors  have  been  used  to  making  their  maps 
out  of  traditional  geographic  databases,  but  what  happens  if  the  source  data 
are  partly  or  totally  derived  from  Volunteered  Geographic  Information  (VGI)? 
VGI  is  geographic  information,  and  past  studies  on  its  quality  (Girres  and 
Touya,  2010;  Haklay,  2010)  have  shown  that  it  was  satisfactory  for  many  uses, 
but  quite  heterogeneous.  Thus,  the  methods  used  for  automated  mapmaking 
should  not  be  disrupted  by  the  use  of  VGI  as  an  input,  but  these  methods  need 
some  adjustment  to  adapt  to  this  new  source  of  data:  this  adjustment  is  the 
topic  of  this  chapter.  Most  of  the  problems  presented  here  have  been  applied 
to  the  automated  cartography  of  OSM,  but  we  believe  these  problems  and  the 
proposed  solutions  also  apply  to  different  VGI  sources,  and  even  to  cases  where 
several  VGI  sources  are  combined  into  a  map. 

The  next  section  of  this  chapter  discusses  the  reasons  why  traditional  auto¬ 
mated  mapping  processes  are  not  fully  adapted  to  VGI,  and  is  followed  by  a 
section  that  describes  attempts  to  solve  these  problems  by  inferring  the  level 
of  detail  of  VGI  features.  The  fourth  section  then  focuses  on  map  generalisa¬ 
tion,  which  may  be  the  most  complex  of  the  cartographic  processes.  In  the 
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fifth  section,  the  level  of  detailed  harmonisation  needed  for  large  scale  maps 
is  discussed,  while  generalisation  is  dedicated  to  medium  or  small  scale  maps. 
The  sixth  part  of  the  chapter  focuses  on  the  assessment  of  the  quality  of  map 
features  prior  to  applying  automatic  processes.  Finally,  in  the  seventh  part,  the 
issues  related  to  advanced  map  stylisation  with  VGI  are  discussed. 


2  Why  Are  Traditional  Automated  Mapping  Processes 
Not  Fully  Adapted  to  VGI? 

Traditional  automated  mapping  processes  have  been  developed  to  process 
authoritative  datasets,  or  at  least  datasets  with  consistent  and  homogeneous 
specifications,  which  is  clearly  not  the  case  when  VGI  is  used  as  (one  of)  the 
map  source(s).  The  first  problem  is  that  VGI  datasets  suffer  from  level  of  detail 
(LoD)  heterogeneities.  For  instance,  there  is  no  LoD  specification  in  OSM, 
which  allows  contributors  a  great  deal  of  freedom  in  capturing  either  detailed 
features  (e.g.  the  cadastral  LoD  buildings  from  Figure  1)  or  less  detailed  features 
(e.g.  the  rough  built-up  areas  or  lake  outlines  in  Figure  1)  depending  partly  on 
their  skills  but  mostly  on  the  data  source,  as  precise  GPS  tracks  allow  more 
precision  than  low-resolution  satellite  imagery.  This  heterogeneity  leads  to  LoD 
inconsistencies,  i.e.  some  very  detailed  features  and  some  less  detailed  features 
might  coexist  on  a  map  and  share  spatial  relations  (Figure  1).  Maps  produced 
by  National  Mapping  Agencies  (NMAs),  on  the  other  hand,  are  based  on  data¬ 
sets  with  strict  specifications,  where  all  features  share  the  same  geometrical 
resolution  or  granularity,  whether  they  belong  to  the  same  theme  or  not.  Thus 
the  processes  used  to  automate  the  production  of  such  high-quality  maps  are 
not  capable  of  handling  the  inconsistencies  shown  in  Figure  1. 

The  main  characteristic  of  VGI  compared  to  traditional  authoritative  data¬ 
sets  is  the  heterogeneity  of  quality,  with  very-good-quality  contributions  and 
very-bad- quality  ones.  This  is  true  for  most  types  of  VGI:  for  OSM  first  and 


Fig.  1:  Examples  of  LoD  inconsistency  in  OSM.  On  the  left,  the  rough  built-up 
areas/forest  limits  intersect  detailed  buildings;  on  the  right,  detailed  footpaths 
lie  on  the  surface  of  a  roughly  digitised  lake.  ©OpenStreetMap  contributors. 
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foremost,  as  shown  in  seminal  studies  by  Girres  and  Touya  (2010)  and  Hak- 
lay  (2010),  but  also  for  photo  sharing  platforms  such  as  Flickr  (Zielstra  and 
Hochmair,  2013),  or  even  for  hiking  route  sharing  platforms  (Ivanovic  et  al., 
2015).  Data  quality  varies  from  theme  to  theme,  but  also  from  feature  to  fea¬ 
ture  in  the  same  theme  (Girres  and  Touya,  2010).  This  is  really  different  from 
authoritative  datasets,  where  data  quality  is  homogeneous,  and  cartography 
processes  are  developed  in  adaptation  to  this  known  quality.  Among  the  qual¬ 
ity  indicators  that  can  be  heterogeneous  with  VGI,  the  most  significant  com¬ 
ponents  are  positional  accuracy,  thematic  accuracy,  completeness  and  logical 
consistency : 

•  Positional  accuracy  heterogeneity  is,  of  course,  a  problem  because  it  can 
increase  the  symbol  overlap  problems  faced  when  creating  a  small  scale 
map.  Heterogeneity  in  positional  accuracy  might  drive  mapmakers  to  use 
incompatible  features  on  the  same  scale,  which  can  give  a  false  picture  of 
the  reality  and  the  relations  among  features. 

•  Heterogeneity  in  thematic  accuracy  is  a  problem  because  automated  car¬ 
tography  relies  on  thematic  information  to  classify  the  map  features.  The 
consequence  of  such  heterogeneity  is  that  processes  should  rely  more  on 
geometry  and  only  use  semantics  when  available. 

•  Completeness  heterogeneity  raises  the  problems  of  ‘empty  space’  in  the  map. 
Empty  spaces  are  useful  to  identify  in  automated  mapmaking  because  they 
are  excellent  candidates  to  solve  space  conflicts  during  map  generalisation 
or  text  placement.  But,  with  VGI,  empty  might  either  mean  really  empty  or 
just  incomplete. 

•  Logical  consistency  heterogeneity  is  also  a  problem,  because  automated  car¬ 
tography  uses,  for  instance,  the  topology  of  geographic  networks  to  identify 
important  features,  and  road  symbolisation  techniques  require  topologi¬ 
cally  correct  networks. 

Traditional  NMA  maps  cover  the  classic  themes  of  topographic  maps,  or  road 
maps,  and  most  automated  mapmaking  processes  focus  on  roads,  buildings, 
hydrography,  relief  or  vegetation.  VGI  has  a  broader  range  of  contributed  geo¬ 
graphic  features;  even  OSM,  which  started  as  a  free  alternative  to  topographic 
maps,  has  been  extended  to  cover  amenities,  shops  or  addresses.  Thus  an  auto¬ 
mated  process  to  make  maps  with  VGI  needs  to  handle  unusual  themes  as  well 
as  classic  road  and  building  datasets. 

Another  particularity  of  VGI  is  the  broader  range  of  scales  used  to  describe 
the  world,  from  world  views  that  range  from  very  small  scales  (smaller  than 
1:  100  000  000  scale)  to  very  large  scales.  For  instance,  OSM  suggests  the  cap¬ 
ture  of  zebra  crossings  or  traffic  signals  that  can  only  be  displayed  at  very  large 
scales.  Some  projects  even  extend  the  OSM  framework  to  indoor  mapping 
(Goetz  and  Zipf,  2011).  In  contrast,  traditional  automated  mapmaking  targets 
a  small  number  of  fixed  scales  (Duchene  et  al.,  2014),  and,  even  when  the  maps 
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are  displayed  in  online  tools,  the  number  of  scales  available  is  often  limited  by 
the  number  of  scales  available  for  paper  maps  (Dumont  et  al,  2016).  In  addi¬ 
tion  to  the  issue  of  the  large  range  of  scales  in  VGI,  it  should  be  noted  that  most 
of  the  automated  processes  were  never  developed  for  large  scales  that  large  and 
for  small  scales  that  small  (e.g.  the  smallest  scale  produced  by  the  French  NMA 
only  covers  the  whole  French  territory,  excluding  overseas  territories). 

Regarding  symbology  and  stylisation,  the  automated  processes  are  strongly 
related  to  the  data  and  semantics.  For  instance,  the  choice  of  road  symbols 
depends  on  the  semantics  of  the  road,  and  there  has  to  be  some  consistency  all 
along  the  road.  When  manipulating  VGI  data,  how  do  we  acquire  these  seman¬ 
tics?  How  do  we  handle  the  heterogeneities  inherent  to  VGI? 


3  Inferring  LoD  in  VGI 
3.1  LoD  or  Scale? 

In  cartography,  the  scale  of  a  map  is  the  ratio  of  the  length  of  an  object  on  the 
map  by  the  length  of  the  same  object  on  the  ground.  But  scale  is  also  somehow 
related  to  map  usage,  and  is  then  a  proxy  for  map  content.  Maps  around  the 
scale  of  1:25k  are  mainly  used  for  hiking  and  contain  information  readable  at 
this  scale  and  useful  for  this  purpose  (e.g.  footpaths,  contour  lines  etc.);  maps 
with  a  scale  smaller  than  1:500k  are  mainly  used  for  road  trips,  and  highlight 
the  map  themes  related  to  roads.  In  contrast,  it  is  too  complex  to  assign  a  scale 
to  VGI  features,  but  here  we  consider  the  scale  of  a  feature  as  the  scale  of  the 
map  at  which  this  feature  would  be  legible  and  legitimate. 

LoD  is  a  vaguer  notion,  which  can  be  considered  as  the  translation  of  map 
scale  to  geographic  databases  for  which  the  scale  is  not  fixed.  Several  factors 
affect  the  level  of  detail  of  geographic  features: 

•  geometric  resolution,  i.e.  the  minimum  distance  between  two  vertices  of  the 
geometry,  as  an  analogy  with  image  resolution. 

•  geometric  precision,  i.e.  the  difference  between  the  position  in  the  database 
and  the  position  in  reality. 

•  granularity,  i.e.  the  size  of  the  smallest  details  in  a  geometry,  such  as  the 
protrusions  in  the  church  in  Figure  2  (left). 

•  semantic  resolution,  i.e.  the  amount  of  details  in  the  semantic  information 
attached  to  the  geometric  feature. 

•  conceptual  schema,  i.e.  how  much  the  ground  truth  information  is  abstracted; 
for  instance,  a  wood  abstracted  by  individual  trees  is  more  detailed  than  one 
abstracted  by  a  polygon  feature. 

Thus  it  is  difficult  to  infer  LoD  as  a  numerical  value  as  one  would  for  scale,  so 
often  categories  are  used,  such  as  the  LoD  for  3D  city  models  (Biljecki  et  al., 
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Fig.  2:  Two  churches  with  a  similar  granularity  on  the  field  that  are  captured 
with  a  different  LoD:  the  left-hand  one  is  captured  from  a  scanned  cadaster 
map  and  the  right-hand  one  from  Bing  imagery.  ©IGN,  France. 


2014).  Touya  and  Brando-Escobar  (2013)  proposed  five  categories  for  the  LoD 
of  OSM  features,  from  Street  level  to  Country  level.  Scales  can  then  be  assigned 
to  features  if  a  scale  range  is  assigned  to  each  LoD  category,  e.g.  the  city  level 
is  assigned  a  scale  range  going  from  1:15k  to  1:50k  (Touya  and  Reimer,  2015). 


3.2  Reverse  Engineering  Scale  Equivalency 

Reimer  et  al.  (2014)  inferred  a  scale  equivalency  for  OSM  features  by  studying 
the  characteristics  of  features  in  existing  maps  at  different  scales:  for  a  given 
map  theme,  the  measure  that  best  characterises  the  difference  in  features  at 
different  scales  is  determined.  In  the  example  of  urban  areas  in  Reimer  et  al. 
(2014),  vertex  frequency  (number  of  vertices  in  the  polygon  ring  divided  by 
the  polygon  perimeter)  was  the  determining  characteristic  (Figure  3).  Then, 
by  inversing  Topfer’s  radical  law  (Topfer  and  Pillewizer,  1966),  which  defines 
the  optimal  number  of  map  features  at  a  scale  given  their  number  at  a  bigger 
scale,  and  applying  it  to  existing  map  features  in  the  maps  of  NMAs,  Reimer 
et  al.  (2014)  were  able  to  calculate  the  scale  equivalency  of  any  urban  area 
in  OSM. 


3.3  Multiple  Criteria  Decision  Method 

We  stated  in  Section  3.1  that  LoD  can  be  affected  by  a  combination  of  five 
factors,  all  of  which  can  be  measured  in  a  geographic  dataset  but  are  hardly 
comparable  or  can  hardly  be  added.  Multi-criteria  decision  methods  are  com¬ 
putational  techniques  that  allow  decision-making  based  on  several  criteria  in 
those  cases  where  a  simple  numerical  value  such  as  a  mean  is  not  a  valid  solu¬ 
tion  (Roy,  2005).  Touya  and  Brando-Escobar  (2013)  propose  a  multi-criteria 
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Fig.  3:  Vertex  frequency  differences  for  urban  areas  in  existing  maps  in  France. 
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Fig.  4:  Results  of  the  automatic  inference  of  LoD  with  the  improved  method 
from  Touya  and  Reimer  (2015)  for  OSM  builtup  areas  in  Tunisia  (left)  and 
OSM  forest  areas  in  France  (right).  ©OpenStreetMap  contributors. 


decision  method  to  classify  VGI  features  into  LoD  categories  from  street  to 
country  level.  The  method  was  improved  by  integrating  elements  from  the 
scale  equivalency  in  Touya  and  Reimer  (2015).  Some  automatic  results  from 
the  improved  method  are  presented  in  Figure  4. 


4  Map  Generalisation  of  VGI 

4.1  Current  Generalisation  in  OpenStreetMap 

Map  generalisation  is  a  complex  process  that  simplifies  and  abstracts  geo¬ 
graphic  information  to  produce  a  legible  map  at  a  given  (smaller)  scale.  The 
problem  of  map  generalisation  automation  has  attracted  research  propos¬ 
als  for  many  years  (see  for  instance  Burghardt  et  al.,  2014;  Mackaness  et  al., 
2007),  and  some  mapping  agencies  are  now  able  to  use  research  results  to  pro¬ 
duce  maps  with  partial  or  total  automation  (Duchene  et  al.,  2014).  One  of 
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the  remaining  challenges  of  automated  generalisation  research  is  to  extend  the 
current  processes  to  make  maps  with  VGI  or  maps  that  combine  authoritative 
and  user  generated  information. 

If  we  look  at  the  default  maps  available  from  OSM,  there  is  almost  no  gen¬ 
eralisation  operation  carried  out  on  them.  This  is  partly  due  to  the  philosophy 
of  the  OSM  portal,  which  aims  to  show  the  content  of  the  dataset  rather  than 
to  display  the  best  map  possible.  But  it  is  also  due  to  the  difficulty  of  the  gener¬ 
alisation  process,  which  involves  complex  mechanisms  that  are  not  available  in 
most  mapping  tools.  However,  some  minimal  selection  operations  are  carried 
out  in  the  default  OSM  map,  using  the  semantics  available  to  choose  the  zoom 
levels  (i.e.  scales)  where  features  should  be  displayed.  The  piece  of  code  below 
is  extracted  from  the  CartoCss  file  used  to  render  buildings  in  the  default  OSM 
map.  It  shows  that  standard  buildings  are  displayed  only  for  zoom  levels  greater 
than  13  (zoom  levels  are  ordered  from  0  for  the  whole  world  to  19  in  OSM),  and 
with  a  coloured  outline  at  zoom  levels  greater  than  15. 

fbuildings  { 

[zoom  >=13]  { 

polygon-fill :  @building-low-zoom; 
polygon-clip:  false; 

1 

[zoom  >=15]  { 

line-color:  @building-line; 
polygon-fill:  @building-fill; 
line-width:  .75; 
line-clip:  false; 

1 

1 

Besides  these  minimal  selection  operations,  there  are  very  few  proposals  dedi¬ 
cated  to  the  issues  of  generalising  VGI  at  present  (Sester  et  al.,  2014).  Klam¬ 
mer  (2013)  proposed  some  solutions  for  tile-based  maps  such  as  OSM,  with 
each  tile  being  generalised  separately,  but  potential  problems  at  tile  junctions 
are  not  handled:  generalisation  often  requires  an  analysis  of  the  neighbouring 
objects,  which  is  not  possible  at  the  edge  of  the  tiles.  Schmid  and  Janetzek 
(2013)  proposed  to  generalise  the  OSM  road  network  at  small  scales  on-the- 
fly  using  important  placenames  in  the  dataset.  However,  most  of  the  issues 
remain  unsolved:  how  can  we  deal  with  the  broad  range  of  scales  in  generali¬ 
sation  processes,  with  the  diversity  of  themes  or  with  the  heterogeneities  in 
quality  and  LoD? 

The  next  two  subsections  address  issues  related  to  the  range  of  scales  and  the 
diversity  of  themes  with  the  generalisation  of  complex  airports  and  railways 
from  OSM.  Section  4.4  addresses  the  generalisation  of  mashup  maps  with  user 
generated  content  on  top  of  reference  datasets. 
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4.2  Generalisation  of  Complex  Airports 

Airports  can  be  described  in  a  great  amount  of  detail  in  OSM,  and  contributors 
often  use  the  OSM  recommendations  to  capture  airports  as  complex  objects 
composed  of  runways,  aprons  where  planes  are  parked,  taxiways  that  connect 
aprons  and  runways,  and  terminal  buildings.  Figure  5  shows  that  such  a  com¬ 
plex  structure  is  hard  to  represent  legibly  when  the  scale  decreases,  so  generali¬ 
sation  algorithms  dedicated  to  such  structures  must  be  used. 

This  subsection  briefly  describes  a  generalisation  process  presented  in  Touya 
and  Girres  (2014),  where  algorithms  for  the  different  types  of  features  com¬ 
prising  airports  are  proposed,  including,  for  instance,  the  decomposition  of 
runways  from  polygons  to  lines.  Here,  we  choose  to  focus  on  taxiway  lines. 
Figure  5  shows  that  the  junctions  of  taxiways  are  often  complex,  with  shapes 
similar  to  slip  roads.  The  first  step  in  generalisation  is  to  automatically  char¬ 
acterise  all  of  these  complex  junctions  (see  the  coloured  polygons  on  the  right 
side  of  Figure  6)  using  the  shapes  of  the  lines,  the  angles  of  the  connection  and 
the  number  of  connected  taxiways.  Then,  each  complex  junction  is  simplified 
to  a  straight  line  crossing,  removing  all  of  the  slip  roads  (Figure  6).  Finally 
strokes  are  computed  within  the  remaining  taxiways.  Strokes  are  groups  of  lines 
that  follow  the  perceptual  grouping  principle  of  good  continuity  (Thomson  and 
Richardson,  1999),  like  a  continuous  pen  stroke,  and  have  been  used  to  simplify 
roads  or  rivers  in  the  generalisation  literature.  Here,  the  smallest  strokes  are 
eliminated  with  a  length  threshold  depending  on  map  scale. 

When  algorithms  for  taxiways,  runways,  aprons  and  terminals  (see  Touya 
and  Girres,  2014)  are  chained,  complete  airports  can  be  generalised;  the  results 
for  OSM  airports  with  different  initial  complexities  are  presented  in  Figure  7, 
showing  that  the  flexibility  of  the  algorithms  allows  for  the  management  of  LoD 
heterogeneity  of  OSM  data. 


Fig.  5:  The  complexity  of  OSM  airports  composed  of  terminals,  aprons,  taxi¬ 
ways  and  runways,  and  their  representation  at  several  zoom  levels.  ©Open- 
StreetMap  contributors. 
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Fig.  6:  Identification  of  different  types  of  taxiway  junctions  (in  red,  pink  and 
blue)  and  their  simplification.  ©OpenStreetMap  contributors. 
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Fig.  7:  1:25k  generalisation  of  airports  of  different  initial  complexities.  ©Open¬ 
StreetMap  contributors. 


4.3  Generalisation  of  Railway  Networks 

Airports  are  not  the  only  geographic  feature  that  is  captured  with  a  greater 
complexity  in  OSM.  The  OSM  specifications  advise  capturing  each  railway, 
even  in  a  train  station  or  in  triage  areas  where  a  great  number  of  lanes  may  exist 
(Figure  8).  The  railway  lines  are  often  very  close  to  each  other  and  their  symbols 
overlap  very  quickly  when  the  scale  decreases.  In  this  case,  a  good  generalisa¬ 
tion  process  is  able  to  handle  different  densities  of  parallel  railways  and  simplify 
them  while  preserving  the  connections  and  the  patterns  of  the  railways. 
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Fig.  8:  A  complex  train  station  in  OSM,  with  all  of  the  railways  captured.  ©OpenStreetMap  contributors. 
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Railway  networks  are  composed  of  two  very  different  types  of  patterns:  the 
main  railway  lines  with  a  small  number  of  parallel  tracks,  and  the  train  station 
with  complex  structures  of  tracks.  The  best  strategy  is  to  handle  both  parts  of 
the  network  separately  with  different  methods  (Touya  and  Girres,  2014;  Savino 
and  Touya,  2015).  The  simplest  railways  to  generalise  are  the  main  railway  lines: 
the  parts  where  several  railway  tracks  are  close  and  parallel  have  to  be  identified 
automatically  and  then  replaced  by  a  single  track  when  the  symbols  overlap 
(Savino  and  Touya,  2015).  The  results  of  this  method  for  railways  extracted 
from  OSM  in  France  are  presented  in  Figure  9. 

Regarding  train  stations,  a  typification  operation  is  required.  Typification 
simplifies  a  pattern  of  geographic  features  while  preserving  the  characteristics 
of  the  pattern  more  than  the  position  of  the  features  taken  individually.  Sev¬ 
eral  complementary  typification  algorithms  are  proposed  in  Touya  and  Girres 
(2014)  and  Savino  and  Touya  (2015),  and  Figure  10  shows  a  result  for  a  1:25k 
map  of  a  small  train  station. 


4.4  Generalisation  of  a  Combination  of  Authoritative  Data  and  VGI 

When  VGI  is  used  as  a  thematic  layer  on  top  of  a  map,  as  in  Figure  11, 
which  is  extracted  from  the  IGN  application  called  ‘Teisure  area’1 ,  the  issues 
related  to  generalisation  are  different  from  those  related  to  generalisation 
of  VGI  only.  The  background  map  can  be  nearly  generalised  as  a  traditional 
topographic  map,  but  the  constraint  is  the  preservation  of  the  relations 
between  the  thematic  layers  and  the  background  layers.  If  we  use  the  exam¬ 
ple  of  Figure  11,  the  route  should  remain  on  top  of  the  road,  even  if  the  road 


Fig.  9:  Main  railways  with  parallel  lanes  collapsed  to  single  lanes  (Savino  and 
Touya,  2015).  ©OpenStreetMap  contributors. 


Production  of  Topographic  Maps  with  VGI:  Quality  Management  and  Automation  73 


Fig.  10:  1:25k  map  generalisation  of  a  small  train  station  (Touya  and  Girres, 
2014).  ©OpenStreetMap  contributors. 


Fig.  1 1 :  Example  of  a  crowdsourced  bike  route  displayed  on  top  of  an  IGN  1 :25k 
topographic  map,  from  the  ‘Espace  loisirs  IGN’  application.  ©IGN,  France. 
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is  generalised,  which  is  likely  to  happen  given  the  sharp  bends  at  the  top  of 
the  figure.  Another  example  in  Figure  11  is  the  spot  of  interest  marked  as 
n°2  in  the  figure,  which  is  located  on  the  summit  of  a  large  bend:  if  the  bend 
is  displaced  by  generalisation,  which  is  a  common  side  effect,  the  symbol 
should  be  adjusted  accordingly. 

When  the  scale  decreases,  Duchene  (2014)  states  that  such  spatial  relations 
should  either  be  preserved  or  sometimes  be  abstracted  to  make  them  leg¬ 
ible  and  understandable  at  the  generalised  scale.  To  enable  this  preservation 
or  abstraction,  the  relevant  spatial  relations  must  be  discovered  and  properly 
characterised,  which  is  not  an  easy  task,  although  propositions  exist  to  model 
these  relations  (Jaara  et  al.,  2014)  with  the  introduction  of  implicit  features  such 
as  bend  summits,  or  to  build  an  ontology  of  such  spatial  relations  relevant  for 
cartography  (Touya  et  al.,  2014). 


5  LoD  harmonisation  for  Large  Scale  Maps 
5.1  How  can  the  LoD  increase? 

At  large  scales,  e.g.  maps  at  a  1:10k  scale,  there  is  no  visualisation  limitation 
for  the  very  detailed  features  existing  in  OSM,  and,  as  a  consequence,  map 
generalisation  is  not  necessary.  For  instance,  the  very  detailed  railway  net¬ 
works  described  in  Section  4  can  have  all  of  their  lanes  displayed  without 
symbol  overlaps  at  large  scales.  But  the  LoD  inconsistencies  illustrated  in 
Figure  1  raise  the  problem  of  the  representation  of  roughly  digitised  features 
at  large  scales.  Most  of  the  geographic  meaning  of  maps  is  conveyed  by  rela¬ 
tions  between  map  features  (Mackaness  et  al.,  2014),  so  the  solving  of  the 
problem  of  LoD  inconsistencies  should  be  focused  on  those  relations  that 
convey  a  specific  meaning. 

Following  the  ideas  of  Monmonier  (1996),  the  idea  to  increase  the  LoD 
of  roughly  digitised  features  is  to  caricature  them  in  order  to  transform  the 
improbable  relations  of  features  into  probable  relations.  For  the  examples  in 
Figure  12,  a  clearing  would  be  introduced  around  the  group  of  buildings,  and 
the  bus  stop  would  be  moved  to  the  closest  road.  We  call  this  operation  to  arti¬ 
ficially  increase  the  LoD  through  probable  spatial  relations  LoD  harmonisation 
(Touya  and  Baley,  in  press).  However,  there  is  no  clue  in  the  data  as  to  the  real 
shape  of  the  clearing  required  in  Figure  12:  we  only  know  that  there  must  be 
one.  This  makes  harmonisation  tend  more  towards  caricature  and  schematic 
mapping  than  towards  realistic  mapping.  The  map  does  not  present  real  and 
precise  shapes  to  the  reader,  but  rather  presents  very  probable  spatial  relations. 
The  next  section  briefly  describes  some  harmonisation  operations  and  shows 
some  results  of  their  implementation  on  OSM  data,  while  Section  5.3  discusses 
the  problem  of  automatically  chaining  these  harmonisation  operations  on  a 
complete  large  scale  map. 
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Fig.  12:  (a)  This  automatically  identified  group  of  buildings  should  not  be 
inside  the  forest,  (b)  The  automatically  identified  bus  stop  (highlighted  by 
the  red  cross)  is  too  far  from  a  road.  ©OpenStreetMap  contributors. 


5.2  Harmonisation  Operations 

Different  types  of  harmonisation  operations  are  described  by  Touya  and  Baley 
(in  press),  and  some  of  these  are  presented  in  this  subsection.  First,  OSM  con¬ 
tains  some  polygon  features  that  represent  functional  sites  such  as  schools, 
hospitals  or  commercial  areas,  which  are  themselves  composed  of  other  fea¬ 
tures  also  represented  in  OSM:  buildings,  roads,  paths,  parks,  sports  fields  or 
helipads.  For  a  clear  understanding  of  what  these  zones  mean  in  the  map,  the 
components  should  really  be  contained  by  the  polygon,  which  is  not  always 
the  case  because  the  components  are  sometimes  much  more  detailed  than  the 
zone  itself.  In  this  case,  the  harmonisation  operation  identifies  the  components 
that  lie  outside  the  zone  and  modifies  the  zone  geometry  so  that  it  includes  the 
missing  components  (Figure  13). 

A  similar  problem  might  occur  with  land  use/cover  parcels  that  are  often 
roughly  digitised  and  some  geographic  features  that  should  be  inside  the  par¬ 
cels.  The  most  current  example  in  OSM  is  the  case  of  urban  areas  with  build¬ 
ings  intersecting  their  limits  or  lying  just  outside.  In  such  cases,  the  land  use 
parcel  geometry  is  extended  by  uniting  the  protruding  geometries  of  the  build¬ 
ing  just  outside  the  area  limits  with  the  urban  area  geometry.  The  method  is 
iterative,  because  new  buildings  can  be  found  just  outside  once  the  geometry 
has  been  extended  (see  automatic  results  in  Figure  14). 

Another  type  of  necessary  harmonisation  operation  is  disambiguation, 
which  aims  to  remove  spatial  relations  that  should  not  exist  in  reality  without 
knowing  what  the  reality  looks  like.  For  instance,  it  is  extremely  unlikely  to  find 
a  group  of  close  buildings  inside  a  forest  without  a  clearing.  When  the  forest 
has  been  roughly  digitised  and  the  buildings  have  a  high  LoD,  we  can  infer  the 
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Fig.  13:  The  hospital  zone  is  harmonised  by  extending  the  polygon  to  include 
all  access  roads.  ©OpenStreetMap  contributors. 


Fig.  14:  The  roughly  digitised  OSM  urban  area  is  distorted  to  include  the  build¬ 
ings  directly  nearby.  ©OpenStreetMap  contributors. 


presence  of  a  clearing  and  try  to  add  it  in  the  forest.  The  proposed  operation 
determines  where  the  overlaps  exist  between  the  buildings  and  the  forest  and 
then  crops  the  newly  created  clearing  with  the  edges  of  the  network  elements, 
which  are  often  barriers  for  forests  (Figure  15). 
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Fig.  15:  The  roughly  digitised  forest  (1)  contains  a  set  overlapping  buildings 
(2),  and  the  newly  created  clearing  is  cropped  (3)  by  network  sections  that 
often  mark  the  limits  of  clearings/forests.  ©OpenStreetMap  contributors. 


More  examples  of  useful  harmonisation  operations  can  be  found  in  Touya 
and  Baley  (in  press). 


5.3  How  to  Chain  Harmonisation  Operations 

Harmonisation  operations  are  the  building  blocks  for  deriving  LoD  harmo¬ 
nised  large  scale  maps,  but  they  are  not  enough,  because  several  problems  can 
occur: 

•  Harmonisation  operations  carried  out  on  close  parts  of  the  map  can  affect 
each  other  and  the  last  one  can  damage  the  previous  harmonisations. 

•  Harmonisation  operations  that  displace  or  distort  features  can  cause  legibil¬ 
ity  problems  with  other  features  of  the  map  (e.g.  symbol  overlap). 

•  Harmonisation  operations  can  be  related  to  each  other  and  the  order  of 
operations  might  have  an  impact;  for  instance,  a  displacement  of  a  building 
that  overlaps  a  riverbank  (Figure  16)  might  put  the  building  just  outside 
the  urban  area,  so  the  extension  of  the  urban  area  should  be  implemented 
afterwards. 

Similar  problems  occurred  with  the  automation  of  map  generalisation  that 
first  developed  individual  algorithms  and  then  tried  to  combine  them  into 
complex  processes  (Harrie  and  Weibel,  2007;  Regnauld  et  al.,  2014).  To  har¬ 
monise  the  area  shown  in  Figure  16,  where  multiple  buildings  overlap  a 
riverbank,  we  therefore  used  an  optimisation  process  inspired  by  map  gen¬ 
eralisation  (Harrie,  1999;  Sester,  2005),  which  combines  the  harmonisation 
of  buildings  that  are  close  to  each  other  into  a  least  squares  adjustment. 
Figure  16  shows  that  for  each  group  of  close  buildings  identified,  all  build¬ 
ings  have  been  jointly  displaced,  avoiding  symbol  overlap  with  the  river  and 
with  other  buildings. 
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Fig.  16: 1)  Detection  of  LoD  inconsistencies  (in  this  case  a  building  intersecting 
the  riverbank);  2)  clusters  of  close  buildings  are  created  around  the  identified 
inconsistencies;  3)  each  cluster  is  harmonised  as  a  whole  to  remove  overlaps 
without  creating  new  ones.  ©OpenStreetMap  contributors. 


6  Quality  Assessment  Taking  into  Account  Crowdsourced 
Ground  Truth  Data 

As  mentioned  in  Section  2,  automatic  mapmaking  processes  require  some  con¬ 
sistency  in  data  quality,  or  some  kind  of  assessment  of  this  quality  if  consist¬ 
ency  is  not  achievable,  which  is  the  case  with  VGI.  This  section  describes  a 
study  to  assess  the  quality  of  OSM  features,  using  ground  truth  data.  In  many 
studies,  OSM  is  usually  used  as  a  proxy  for  VGI  data;  this  study  is  not  an  excep¬ 
tion,  as  OSM  is  a  prime  source  of  vector-encoded  GI  that  can  be  directly  used 
in  cartographic  processes.  However,  any  effort  in  mapmaking  using  VGI  data 
should  expand  its  horizons  to  include  other  sources  as  well.  Today,  VGI  comes 
from  different  sources  and  in  many  flavours,  such  as  toponyms,  GPS  tracks, 
geotagged  photographs,  synchronous  micro-blogging,  social  networking  con¬ 
tent,  blogs,  gaming  spaces,  sensor  measurements,  etc.  All  of  these  sources  can 
either  possibly  offer  valuable  geographic  information  complementary  to  OSM 
data  (e.g.  Geonames  can  provide  a  supplementary  dataset  to  the  OSM  places) 
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or  be  used  as  quality  assessment  tools  (e.g.  through  the  use  of  geotagged  pho¬ 
tographs  from  photo-sharing  repositories).  This  latter  case  is  the  focus  of  this 
section. 

Geotagged  photographs  are,  in  a  sense,  in-situ  observations  of  the  ground 
reality  and  thus,  if  properly  used,  can  assess  various  quality  factors  of  OSM 
data  and  improve  the  decisions  in  some  of  the  cartographic  processes  ana¬ 
lysed  above.  As  explained,  semantic  mismatches,  topological  and  positional 
errors  and  vague  and  ambiguous  cases  of  overlaps  and  intersections  should 
be  expected  when  handling  VGI.  All  these  cases  pose  a  challenging  task  when 
it  comes  to  disambiguating  them  and  can  negatively  affect  the  outcome  of  the 
cartographic  processes. 

When  relying  solely  on  VGI  data  for  mapmaking,  the  ambiguous  cases 
first  need  to  be  recognised  and  located,  and  then  corrected  or  verified  by  the 
contributors  themselves.  Indeed,  it  has  been  documented  that  the  positional 
quality  of  features  improves  as  more  contributors  add  data  or  modify  a  feature 
(Haklay  et  al.,  2010).  However,  participation  biases  (Antoniou  and  Schlieder, 
2014)  and  the  digital  divide  (Graham  et  al.,  2014)  can  negatively  affect  a  wide¬ 
spread  effort  of  quality  improvement.  Hence,  we  need  to  devise  methods,  by 
using  diverse  VGI  data,  that  can  more  easily  identify  and  correct  such  poten¬ 
tial  sources  of  error  before  they  enter  the  cartographic  chain  of  processes:  in  a 
sense,  the  mixture  of  diverse  VGI  sources  might  counter-balance  biases  and 
errors  from  individual  VGI  sources. 

Although  there  is  no  direct  link  between  geotagged  photographs  and  map 
scales,  it  can  be  inferred  that,  as  geotagged  photographs  usually  capture  a  small 
ground  area  from  a  close  distance  in  high  detail,  they  can  be  of  help  in  large 
scale  maps.  In  general,  cases  where  geotagged  photographs  can  provide  better 
ground  truth  include  the  efforts  to: 

•  verify  if  a  feature  exists  (i.e.  assess  completeness) 

•  verify  the  type  of  a  feature  (i.e.  assess  thematic  accuracy) 

•  verify  the  topology  and  the  relationship  between  features  (i.e.  assess  logical 
consistency) 

•  verify  the  state  of  a  feature  for  a  particular  time-stamp  (i.e.  assess  temporal 
accuracy). 

Here,  as  a  case  study,  we  focus  on  the  use  of  other  VGI  sources  (i.e.  Flickr 
geotagged  photographs)  to  evaluate  the  validity  of  OSM  Points  of  Interest 
(POIs)  in  three  different  scenarios  trying  to  i)  verify  the  OSM  points  that  could 
not  have  been  created  through  image  interpretation  as  there  are  objects  that 
obscure  the  view  (i.e.  trees  and  wooded  areas),  and  whose  OSM  updates  conse¬ 
quently  normally  require  the  physical  presence  of  contributors  on  the  ground; 
ii)  disambiguate  areas  of  overlapping  OSM  land  use/land  cover  types  at  a  given 
point  in  time  (for  more,  see  Antoniou  et  al.,  2016);  and  iii)  correct  problematic 
POIs  in  terms  of  topo-semantic  consistency. 
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6.1  Verify  OSMPOIs 

One  of  the  comparative  advantages  of  VGI  is  that  it  can  provide  timely  data 
for  areas  and  cases  where  other  sources  cannot  be  equally  effective.  One  such 
case  is  that  of  the  areas  where  satellite  imagery  (a  prominent  way  of  capturing 
authoritative  data)  cannot  provide  the  needed  information,  e.g.  under  wooded 
areas  (Figure  17).  Here,  local  knowledge  by  contributors  is  valuable,  as  in- 
situ  observations  can  be  an  important  source  of  information.  In  this  context, 
geotagged  images  are  well  placed  to  play  a  significant  role. 

For  the  verification  of  the  OSM  POIs,  an  online  application  has  been  devel¬ 
oped  that  displays  a  geotagged  photograph,  retrieved  using  the  Flickr  API,  and 
asks  the  user  whether  a  specific  POI  could  be  recognised  within  approximately 
X  meters  (as  computed  by  the  location  of  the  POI  and  the  geotagged  photo¬ 
graph)  in  the  photograph.  Thus,  for  example,  the  question  has  the  form  ‘Do 
you  see  a  monument  about  2m  away,  in  the  photo  below?’  (for  more  on  this, 
see  Antoniou  et  al.,  2016).  Figure  18  shows  a  number  of  illustrative  examples 
generated  by  the  application. 

A  systematic  fusion  of  diverse  VGI  sources  can  improve  the  quality  of  the 
data  used  for  mapmaking  not  only  in  the  initial  phases  of  data  gathering  but 
also  in  a  step-by-step  implementation  of  cartographic  processes  as  shown 
above.  For  example,  in  the  case  shown  in  Figure  15,  geotagged  photographs 
could  be  used  to  examine  and  verify  if  such  openings  in  the  forest  really  exist 
or  if  the  constructions  portrayed  are  hidden  under  the  woods. 


6.2  Verify  OSM  Land  Use  /  Land  Cover 

The  second  case  study  for  using  geotagged  photographs  to  evaluate  a  VGI  data¬ 
set  comes  from  the  Land  Use/Land  Cover  (LU/LC)  domain.  Here  the  challenge 
is  to  disambiguate  inconsistencies  regarding  the  actual  LU/LC  that  arise  from 
contradictory  feature  types  that  occur  between  different  OSM  layers,  e.g.  in  the 
Landuse  and  the  Natural  OSM  layers  (a  more  thorough  study  can  be  found  in 


Fig.  17:  A  satellite  image  of  a  sample  area  in  Paris  (left)  and  the  polygons  of 
wooded  areas  (right)  for  the  same  area  (©IGN,  France). 
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Fig.  18:  Illustrative  screenshots  of  an  ad-hoc  application  that  retrieves  geotagged 
photos  for  POI  evaluation.  Creative  Commons  licensed  (BY-NC-ND)  Flickr 
contributors. 


Fonte  et  al.,  2016).  The  LU/LC  at  each  given  point  should  be  unambiguously 
retrieved:  this  requirement  not  only  contributes  to  the  overall  quality  of  OSM 
and  to  the  correct  cartographic  output  but  also  enables  the  use  of  OSM  data 
for  the  creation  of  LU/LC  products.  Here  again,  overlaps  between  different  and 
contradictory  LU/LC  feature  types  create  inconsistencies  that  could  possibly  be 
disambiguated  with  the  use  of  geotagged  photographs.  For  example,  Figure  19 
(left)  shows  the  overlap  of  a  closed  construction  site  (purple  polygon)  and  a  res¬ 
idential  road  (green  line)  in  OSM  (green  dots  represent  the  locations  of  Flickr 
photographs).  Although  the  VGI  elements  co-exist  in  the  same  VGI  source  (i.e. 
in  OSM),  it  is  obvious  that  it  is  not  possible  for  both  layers  to  correctly  denote 
the  actual  land  use  of  the  area.  The  use  of  geotagged  images  could  provide  the 
necessary  information  to  clarify  the  mismatch.  In  Figure  19  (right),  a  Flickr 
photograph  taken  within  the  polygon  clearly  shows  that  the  area  has  been 
turned  into  a  construction  site.  Additionally,  a  valuable  characteristic  of  the 
VGI  datasets  used  is  the  time  information  they  contain:  using  the  individual 
timestamps  of  features,  it  is  possible  to  analyse  and  understand  the  currency  of 
each  feature,  which  could  be  valuable  in  updating  the  overlapping  features  that 
have  outdated  information. 
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Fig.  19:  Mismatches  between  the  OSM  Roads  and  Landuse  layers  (left).  A 
Flickr  photograph  of  the  area  (right).  ©OpenStreetMap  contributors.  Crea¬ 
tive  Commons  licensed  (BY-NC-ND)  Flickr  contributors. 


With  the  two  illustrations  given  in  this  and  the  previous  section,  it  is  shown 
that  mixing  independent  VGI  sources  can  prove  a  helpful  way  to  spot  possible 
errors,  to  evaluate  the  validity  of  features  and  to  justify  the  implementation  of 
various  cartographic  processes.  In  this  context,  the  proactive  disambiguation 
of  vague  cases  in  large  scales  can  lead  to  correct  decisions  on  the  cartographic 
processes  described  above  and  avert  the  propagation  of  errors  when  moving  to 
smaller  scales. 


6.3  Verifying  and  Correcting  Topo-semantic  (In)consistency 

Topo-semantic  consistency  (Servigne  et  al.,  2000)  is  a  subset  of  logical  con¬ 
sistency  that  concerns  the  correctness  of  the  topological  relationship  between 
two  objects  according  to  their  semantics.  Topo-semantic  consistency  refers 
to  the  consistency  of  geographic  objects  with  other  geographic  objects  of 
the  same  theme  (intra-theme  consistency)  or  of  other  themes  (inter-theme 
consistency).  Inconsistency  exists  in  VGI  due  to  the  absence  of  integrity  con¬ 
straints  and,  therefore,  depends  on  the  expertise  of  the  data  contributor.  A 
map  should  not  portray  inconsistencies;  thus,  inconsistencies  should  be  iden¬ 
tified  and  resolved  during  the  mapmaking  process.  Instead  of  correcting  these 
errors  in  order  to  satisfy  consistency  blindly  and  without  taking  reality  into 
account,  correction  can  be  based  on  ground  truth  provided  by  Flickr  images, 
as  explained  earlier. 

A  number  of  tests  can  be  applied  in  order  to  find  inconsistencies  in  the 
OSM  data  between  features  from  the  same  layer  (e.g.  two  roads),  or  from  dif¬ 
ferent  layers.  Tests  are  based  on  consistency  evaluation  utilising  topological 
relations  that  the  data  should  satisfy,  taking  the  data  semantics  captured  by 
their  attributes  into  account  as  well.  In  OSM,  apart  from  the  geometry  capture, 
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the  existence  of  a  plethora  of  tags  provides  a  rich  semantic  dataset,  and  thus 
sophisticated  topo-semantic  relations  can  be  explored.  Here,  we  focus  on  POIs 
because  they  are  more  easily  captured  in  photographs  due  to  their  dimensions. 
POIs  that  are  problematic  with  regards  to  their  position  in  comparison  to 
other  layers  can  be  verified  with  Flickr  images.  If  the  Flickr  images  prove  that 
the  topo-semantic  relation  is  correct,  then  no  changes  are  made;  otherwise 
the  geometry  (relative  horizontal  position)  and/or  the  semantic  information 
(Type  tag)  is  updated  according  to  the  photograph.  Finally,  the  topo-semantic 
relations  are  re-evaluated. 

A  case  study  was  performed  with  OSM  data  that  cover  the  broader  Paris  area 
(Antoniou  et  al.,  2016).  According  to  this  study,  in  the  area  of  interest  there 
are  22,527  OSM  POIs  with  two  main  attribute  tags  related  to  their  identity: 
Name  and  Type.  Topological  relations  of  POIs  against  other  thematic  layers  are 
examined  based  on  a  number  of  checks,  and  errors  will  be  examined  utilising 
Flickr  photographs.  For  example,  it  is  important  to  investigate  the  topologi¬ 
cal  relationship  between  POIs  and  buildings,  examining  whether  POIs  should 
be  situated  inside  or  outside  building  polygons.  Initially  POIs  are  clipped  with 
the  convex  hull  of  the  area  covered  by  buildings,  resulting  in  60136  points.  A 
number  of  points  (21872)  are  situated  inside  the  building  polygons,  2338  (4%) 
are  situated  on  the  building  boundaries  and  35926  (60%)  are  situated  outside. 
It  is  examined  whether  the  position  of  the  POIs  outside  of  the  buildings  is  valid 
based  on  their  semantics  captured  with  the  Type  attribute.  Based  on  this  test, 
30497  (85%  of  the  initial  estimate)  can  indeed  be  situated  outside  but  5429 
(15%  of  the  original  estimate)  should  be  situated  inside  the  building  polygons 
and  need  further  investigation.  Similarly,  a  number  of  points  (24210)  are  situ¬ 
ated  inside  the  building  polygons.  Based  on  a  similar  test,  22047  (91%)  can 
indeed  be  situated  inside  but  2163  (9%)  should  be  situated  outside  the  building 
polygons  and  need  further  investigation.  In  this  study,  the  correct  position  of 
the  points  in  relation  to  the  buildings  was  decided  according  to  common  sense. 

In  another  test,  POIs  that  are  semantically  related  to  roads  and  railways  are 
examined  against  the  network  geometry.  Regarding  POIs  that  are  tagged  as 
crossings  (12612),  99.5%  (12552)  are  situated  on  road  intersections  and  only 
60  of  them  (0.5%)  have  a  different  position  and  need  checking.  Regarding  POIs 
that  are  tagged  as  traffic  lights  (12612),  99.2%  (2292)  are  situated  on  the  road 
intersections  and  only  18  of  them  (0.8%)  have  a  different  position  that  will  be 
further  checked.  POIs  that  are  tagged  as  ‘level  crossings’  (209)  and  ‘railway_ 
crossing’  (1)  are  situated  on  the  rail  network  intersections.  Points  semantically 
related  to  the  intersections  of  the  rail  and  road  network,  such  as  level  crossings, 
are  checked  in  relation  to  the  actual  intersections  of  the  road  and  rail  network. 
Of  the  1101  points,  949  (86%)  are  situated  on  the  intersections  while  152  (14%) 
have  a  different  position  and  need  further  investigation.  Of  course  map  scale 
is  also  an  important  factor  when  judging  distance.  For  example,  the  distance 
between  network  junctions  and  POIs  tagged  as  crossings  might  be  negligible 
in  relation  to  scale. 
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The  inspection  of  topo-semantic  relations  highlights  areas  where  consist¬ 
ency  is  not  fulfilled  and  should  be  corrected  during  the  mapmaking  process. 
Pre-processing  based  on  topo-semantic  relations  limits  the  intervention  of  car¬ 
tographers  to  only  those  cases  that  are  problematic.  Whereas  an  in  situ  visit 
costs  time  and  money,  the  provision  of  ground  truth  through  geotagged  Flickr 
images  is  a  welcome  alternative  solution  emerging  from  the  VGI  universe. 

7  VGI  and  Symbol  Specification 

This  section  discusses  issues  related  to  VGI  symbolisation,  and  is  more  forward 
looking  than  the  previous  ones.  As  with  the  other  previously  described  carto¬ 
graphic  processes,  the  main  issues  regarding  symbol  specification  with  VGI  are: 
what  could  be  impacted  by  this  new  source  of  data,  and  what  should  be  adapted 
and  how?  A  reminder  of  the  symbol  specification  process  is  given  first.  Then,  we 
highlight  aspects  to  be  discussed  and  controlled  to  adapt  this  process  to  VGI. 


7.1  The  Symbol  Specification  Process 

The  symbol  specification  process  occurs  at  the  end  of  the  global  cartographic 
design  process.  At  this  stage,  the  input  objects  should  be  generalised  for  the 
expected  map  scale  in  order  to  be  able  to  properly  specify  styles  that  are  suit¬ 
able  at  this  scale.  Traditional  cartographic  symbolisation,  for  instance  in  map 
series  production,  is  based  on  historical  knowledge  of  symbol  specifications 
and  cartographic  practices  and  processes,  related  to  a  particular  topographic 
style  (Ory  et  al.,  2015).  Symbol  specifications  have  also  been  considered  as  a 
user  controllable  problem  in  order  to  make  personalised  maps  (Christophe, 
2011).  Research  on  style  and  symbol  specification  now  focuses  on  processes 
inspired  by  computer  graphics  to  mimic  traditional  cartographic  symbolisa¬ 
tion,  or  to  apply  artistic  styles  to  maps  (Christophe  et  al.,  2016).  The  three  main 
steps  of  the  symbol  specification  process  are: 

•  Legend  specification:  themes  and  semantic  relations  between  map  themes.  It 
first  requires  that  the  legend  be  structured  by  semantic  themes  with  seman¬ 
tic  relationships  (e.g.  rivers  and  lakes  are  in  the  same  legend  theme  and 
their  symbols  should  be  related). 

•  Style  specification:  signs  for  themes.  This  requires  choosing  and  combining 
relevant  graphic  signs  to  enhance  semantic  relations  on  the  map. 

•  Map  rendering.  The  rendering  step  effectively  applies  the  style  specification 
to  the  cartographic  objects  on  the  map.  It  may  involve  complex  rendering 
techniques,  such  as  textures  to  render  forest  areas. 

Tools  such  as  Mapnik2,  which  are  used  to  make  maps  with  OSM,  do  provide 
some  basic  rendering  methods,  including  polygon  texture  fills  or  advanced  text 
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rendering,  that  could  be  extended  to  help  users  complete  the  three  steps  of 
symbol  specification. 


7.2  Discussion  and  Guidelines  for  Using  VGI  in  Symbol 
Specification  Processes 

As  for  the  other  mapmaking  processes,  the  first  issue  to  address  when  using 
VGI  in  symbol  specification  processes  is  the  adaptation  of  processes  developed 
for  consistent  databases  to  the  heterogeneity  of  VGI.  This  adaptation  can  be 
achieved  by  a  characterisation  of  VGI  features,  i.e.  its  quality,  semantics  and 
LoD.  But  such  characteristics  of  quality  or  LoD  are  no  longer  consistent  on  a 
given  map  theme,  as  each  VGI  feature  might  have  its  own  quality  or  LoD.  Thus 
a  symbol  specification  for  each  map  theme  might  not  be  possible  with  VGI.  For 
the  same  map  theme,  for  instance  rivers,  the  symbol  might  be  adapted  to  the 
quality,  semantics  and  LoD  of  the  features  (e.g.  darker  shades  of  blue  and  wider 
symbols  for  rivers  with  more  details/better  quality). 

A  typical  use  case  of  maps  made  with  VGI  is  the  mashup  map  with  crowd- 
sourced  thematic  data  on  top  of  existing  reference  data.  In  this  case,  the  symbol 
specification  for  the  reference  background  might  have  been  designed  indepen¬ 
dently  from  the  thematic  data;  thus  the  addition  of  thematic  VGI  involves  three 
problems: 

•  Management  of  contrasts:  the  thematic  data  should  be  more  legible  than 
the  background  and  the  contrast  in  the  background  should  be  altered  to 
optimise  the  contrast  with  the  thematic  data. 

•  Preserving  a  topographic  style:  adding  a  crowdsourced  thematic  layer 
should  not  prevent  the  map  reader  from  understanding  the  topographic 
style  of  the  background. 

•  Visualising  imprecision  aspects:  the  thematic  layer  is  both  heterogeneous  in 
terms  of  quality  and  different  from  the  background.  Thus  the  symbol  speci¬ 
fication  should  convey  these  differences  as  much  as  possible  (see  Chapter  9 
by  Skopeliti  et  al.  (2017)  regarding  quality  visualisation). 


7.3  Crowdsourcing  the  Symbol  Specification  Process ? 

The  symbol  or  style  specification  process  is  user-driven,  as  the  map  purpose 
and  the  map  user  needs  are  translated  into  a  legend  and  rendered  on  the  map. 
Additionally  to  the  use  of  crowdsourced  data  in  the  map,  a  crowdsourced 
map  could  also  include  a  more  important  interaction  with  the  user  during 
the  mapmaking  process:  for  example,  a  consensus  decision  among  OSM  con¬ 
tributors  could  be  reached  regarding  the  colour  to  use  to  render  the  forest 
areas  in  the  standard  display.  Research  on  automated  on-demand  mapping 
tries  to  capture  the  needs  of  users  through  techniques  such  as  ontologies  and 
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interactions  (Bailey  et  al.,  2014),  but  allowing  the  users  to  choose  the  way 
crowdsourced  data  can  be  rendered  in  the  legend  and  the  map  requires  a  step 
further  in  this  direction. 


8  Conclusions  and  Further  Work 

This  chapter  addressed  the  challenges  of  automated  mapmaking  using  VGI  as 
input  data.  VGI  differs  from  traditional  geographic  databases  because  of  het¬ 
erogeneities  in  quality  and  LoD,  and  because  of  thematic  diversity,  so  existing 
methods  for  automated  mapmaking  have  to  adapt  to  this  situation.  This  chapter 
described  a  proposition  to  infer  the  LoD  of  VGI  features  to  overcome  hetero¬ 
geneity,  and  then  presented  methods  that  use  this  inference  to  make  maps  at 
different  scales  using  map  generalisation  or  LoD  harmonisation.  The  paper  also 
proposed  techniques  to  overcome  the  quality  heterogeneity,  which  can  alter  the 
map  legibility.  Finally,  the  paper  discussed  how  advanced  stylisation  techniques 
could  be  applied  to  VGI. 

There  is  much  more  work  to  be  done,  as  automated  mapmaking  itself  is  a 
large  research  topic.  The  long-term  goal  is  to  design  adaptive  and  completely 
automated  cartographic  processes,  because  the  amount  of  data  is  too  large  for 
manual  cartography,  and  the  content  has  to  be  adapted  to  different  needs  and 
display  devices.  Beyond  continuing  to  improve  the  methods  presented  here,  it 
must  be  noted  that  generalisation  and  harmonisation  operations  do  not  han¬ 
dle  quality  heterogeneities  yet,  and  we  should  investigate  how  such  processes 
can  adapt  to  quality  information  that  can  be  inferred  from  VGI  features  simi¬ 
larly  to  the  handling  of  LoD  information  discussed  above.  For  instance,  a  forest 
imported  from  Corine  Land  Cover  and  one  captured  precisely  with  satellite 
imagery  do  not  require  the  same  simplification  algorithms.  The  future  diffu¬ 
sion  of  web  maps  will  be  based  on  vector  maps  using  vector  tiling,  such  as  the 
OpenScienceMap  project  that  provides  a  vector  mapping  of  OSM.  Such  web 
maps  will  raise  several  research  questions,  such  as  that  of  the  online  triggering 
of  generalisation  and  harmonisation  processes,  when  such  processes  are  mostly 
designed  for  offline  processing.  The  question  of  tiled  processing  is  also  an  issue, 
as  mapmaking  processes  make  considerable  use  of  the  geographic  neighbour¬ 
hood  of  features  to  choose  the  best  process.  The  development  of  vector  web 
maps  will  also  enable  user  customisation  of  stylisation,  which  will  require 
addressing  the  research  issues  discussed  in  the  last  section  of  this  chapter. 


Previous  publication 

Section  6  was  partly  published  in  Antoniou,  V.,  Skopeliti,  A.,  Fonte,  C.,  See,  L., 
Alvanides,  S.  (2016).  Using  OSM,  geo-tagged  Flickr  photos  and  authoritative 
data:  A  quality  perspective,  in  Bandrova  T.,  Konecny,  M.  (Eds.)  Proceedings,  6th 
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International  Conference  on  Cartography  and  GIS,  13-17  June  2016,  Albena, 
Bulgaria.  Available  at  http://cartography-gis.com/docsbca/iccgis2016/ICC- 
GIS2016-49.pdf  [Last  accessed  13  April  2017] 

In  section  6  the  link  between  quality  control  and  the  topographic  maps  is  addi¬ 
tionally  discussed  as  the  previous  paper  did  not  focus  on  a  particular  application. 


Notes 


1  http://espaceloisirs.ign.fr 

2  http://mapnik.org 
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Abstract 

Volunteers  are  the  key  component  in  the  collection  of  Volunteered  Geographic 
Information  (VGI),  so  what  motivates  their  participation,  what  strategies  work 
in  recruitment  and  how  sustainability  of  participation  can  be  achieved  are  key 
questions  that  need  to  be  answered  to  inform  VGI  system  design  and  imple¬ 
mentation.  This  chapter  reviews  studies  that  have  examined  these  questions 
and  presents  the  main  motivational  factors  that  drive  volunteer  participation, 
as  determined  from  empirical  research.  Some  best  practices  from  broader  citi¬ 
zen  science  applications  are  also  presented  that  may  have  relevance  for  VGI  ini¬ 
tiatives.  Finally,  a  set  of  case  studies  from  our  experiences  are  used  to  illustrate 
how  volunteers  have  been  motivated  to  collect  VGI  through  mapping  parties, 
gamifi cation  and  working  with  schools. 
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1  Introduction 

Volunteered  Geographic  Information  (VGI;  a  term  originally  coined  by  Good- 
child,  2007)  has  two  main  components,  i.e.  the  volunteer  and  the  spatial  infor¬ 
mation.  Much  of  the  literature  on  VGI  examines  either  the  second  component, 
i.e.  the  geographic  data  collected,  often  in  relation  to  its  quality  (e.g.  Flanagin 
and  Metzger,  2008;  Haklay,  2010;  Foody  et  al.,  2013;  Antoniou  and  Skopeliti, 
2015),  or  how  VGI  has  been  used  in  different  contexts  (e.g.  Zook  et  al,  2010; 
Barrington  et  al.,  2011;  Mooney  and  Corcoran,  2011;  Connors  et  al.,  2012).  Yet 
it  is  the  volunteer  that  is  actually  at  the  heart  of  VGI  and  the  reason  why  there 
are  many  successful  examples  of  it  (See  et  al.,  2016;  Chapter  2  by  See  et  al, 
2017),  one  in  particular  being  OpenStreetMap  (OSM).  Thus  issues  such  as 
attracting  and  retaining  volunteers,  and  understanding  participant  motivations 
and  what  incentives  can  be  used  to  attract  volunteers,  are  as  important  as  the 
spatial  information  that  is  collected,  particularly  in  designing  new  VGI  applica¬ 
tions.  The  importance  of  the  volunteer  has  been  recognised  in  a  recent  paper  by 
Gomez-Barron  et  al.  (2016),  where  the  authors  consider  motivational  factors 
for  VGI  as  a  critical  part  of  the  participation  planning  phase  in  the  design  of 
any  VGI  system. 

There  are  biases  observed  in  participation  that  are  a  general  characteristic 
of  any  application  of  user-generated  content.  One  of  these  is  referred  to  as 
the  1%  rule  (or  the  90:10:1  rule),  and  states  that  90%  of  the  content  is  pro¬ 
vided  by  only  1%  of  the  users  (Nielsen,  2006).  Of  the  remaining  users,  9% 
provide  content  some  of  the  time  while  90%  use  the  content  but  do  not  con¬ 
tribute  anything.  Although  these  numbers  may  change  slightly  from  applica¬ 
tion  to  application,  Nielsen  (2006)  argues  that  participation  inequality  cannot 
be  eliminated.  Such  inequalities  exist  even  in  highly  successful  collaborative 
applications  such  as  Wikipedia;  for  example.  He  (2012)  found  that  active  users 
have  generated  around  3.5%  of  the  content  of  Wikipedia  and  that  this  gen¬ 
eral  pattern  has  not  changed  over  time,  while  Wikipedia’s  own  statistics  for 
2016  show  that  less  than  0.5%  of  content  is  currently  provided  by  active  users 
(Wikipedia,  2016).  Despite  the  success  of  OSM,  there  are  also  biases  in  it: 
Neis  and  Zielstra  (2014)  reviewed  participation  inequality  studies  for  OSM 
and  found  that  10%  of  those  registered  in  2008  contributed  actively  while  a 
study  in  2010  showed  that  only  3.5%  of  volunteers  accounted  for  98%  of  the 
content  (Neis  et  al.,  2011). 

Given  these  highly  skewed  figures,  the  aim  of  this  chapter  is  to  present 
ways  in  which  the  number  of  active  participants  can  be  increased  in  order  to 
change  the  shape  of  the  participation  inequality  curve  (Nielsen,  2006).  The 
starting  point  is  to  understand  the  nature  of  VGI  participants  and  what  moti¬ 
vates  their  contributions.  Through  a  review  of  existing  studies  of  VGI  motiva¬ 
tion,  the  factors  that  are  relevant  to  the  development  of  strategies  to  improve 
recruitment  and  to  increase  the  motivation  and  retention  of  volunteers  in 
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VGI  are  outlined.  This  is  followed  by  a  synthesis  of  some  of  the  best  practices 
from  VGI  and  citizen  science  experiences.  Finally,  case  studies  of  VGI  are 
used  to  highlight  different  ways  in  which  recruitment,  motivation  and  reten¬ 
tion  have  been  tackled. 


2  What  Motivates  Volunteers  in  VGI? 

2.1  The  Nature  of  Volunteers 

To  help  understand  volunteer  motivations  with  respect  to  VGI  and  how  they 
might  differ  between  participants,  it  is  useful  to  first  understand  the  nature  of 
the  volunteers  that  take  part  in  VGI.  This  is  usually  done  by  classifying  volun¬ 
teers  into  types  according  to  factors  such  as  their  knowledge  of  the  subject  or 
their  degree  of  participation.  Coleman  et  al.  (2009)  offer  one  typology  of  five 
types  that  are  situated  along  a  spectrum  ranging  from  Neophytes  at  one  end, 
who  include  individuals  that  have  no  background  in  the  area  but  have  the  time 
and  interest  to  contribute,  to  Expert  Authorities  at  the  other  end,  who  have 
considerable  experience  in  mapping  technologies  and  product  specifications; 
in  between  are  Interested  Amateurs,  Expert  Amateurs  and  Expert  Profession¬ 
als.  However,  Coleman  et  al.  (2009)  argue  that  this  typology  is  too  simplistic  for 
VGI,  offering  some  examples  of  where  the  typology  breaks  down:  for  example, 
a  Neophyte  may  have  little  expertise  in  the  subject  area  but  their  local  knowl¬ 
edge  of  an  area  might  mean  they  can  provide  valuable  contributions  that  more 
experienced  individuals  from  other  types  cannot. 

Another  typology,  which  was  developed  as  part  of  a  EuroSDR  Workshop,  is 
offered  by  Heipke  (2010).  It  includes: 

•  map  lovers  and  experts,  who  would  be  happy  to  provide  accurate  informa¬ 
tion  when,  for  example,  maps  are  wrong  or  information  is  missing; 

•  casual  mappers  such  as  those  from  the  biking/hiking  community; 

•  media  mappers  that  respond  to  specific  campaigns  in  bursts  of  activity  such 
as  during  mapping  parties  or  post-disaster  events; 

•  passive  mappers,  e.g.  people  who  provide  traffic  data  via  their  mobile  phone; 

•  open  mappers,  e.g.  those  contributing  to  initiatives  such  as  OSM; 

•  and  mappers  that  would  be  motivated  by  financial  incentives,  e.g.  through 
using  Amazons  Mechanical  Turk. 

This  typology  already  provides  some  insights  into  possible  motivational  factors 
such  as  interest  in  the  subject  or  material  gain.  The  open  mappers  were  identi¬ 
fied  as  being  the  largest  group  after  passive  mappers  and  one  that  is  increasing 
in  size  over  time.  Although  their  motivations  are  thought  to  be  altruistic  and 
related  to  building  and  using  open  datasets  as  a  public  good  (Goodchild,  2007; 
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Heipke,  2010),  the  range  of  motivations  driving  the  group  of  open  mappers  is 
much  more  complex  and  nuanced  (Budhathoki  and  Haythornthwaite,  2012),  as 
outlined  in  the  next  section. 


2.2  Motivational  Factors  for  VGI  Participation 

Coleman  et  al.  (2009)  offer  different  motivations  for  participation  in  VGI  that 
are  based  on  empirical  research  from  Wikipedia  and  the  open  source  com¬ 
munity.  These  include:  altruism;  professional  or  personal  interest;  intellec¬ 
tual  stimulation;  protection  or  enhancement  of  a  personal  investment;  social 
reward;  enhanced  personal  reputation;  participation  providing  an  outlet  for 
creative  and  independent  self-expression;  and  pride  of  place.  The  idea  of  local 
knowledge  is  captured  in  pride  of  place  and  is  relevant  to  applications  such 
as  OSM  where  mappers  more  frequently  map  or  update  their  local  areas  than 
areas  further  afield  unless  they  are  driven  by  mapping  parties  or  humanitarian 
causes.  However,  other  motivating  factors,  such  as  providing  an  outlet  for  crea¬ 
tive  and  independent  self-expression,  may  be  less  relevant  to  the  mapping  of 
features  in  OSM. 

A  very  comprehensive  identification  of  motivational  factors  for  VGI  has  been 
provided  by  Budhathoki  and  Haythornthwaite  (2012),  who  reviewed  the  lit¬ 
erature  on  motivations  from  three  distinct  yet  relevant  domains:  volunteerism; 
leisure;  and  the  generation  of  knowledge  online.  The  factors  were  divided  into 
intrinsic  motivations,  which  come  directly  from  the  individual;  and  extrinsic 
motivations,  which  come  from  the  outside  -  such  as  financial  incentives  or 
gaining  a  positive  reputation  based  on  the  quality  of  ones  contributions  or 
from  peers.  The  factors  are  listed  in  Table  1  and  are  summarised  from  the  origi¬ 
nal  list  that  was  provided  in  Budhathoki  (2010).  They  can  provide  the  basis  for 
further  investigation  into  understanding  the  motivations  of  participants  in  any 
given  VGI  application. 

Budhathoki  and  Haythornthwaite  (2012)  used  the  motivational  factors  listed 
in  Table  1  as  the  basis  of  a  survey  undertaken  with  OSM  volunteers  in  order  to 
understand  which  motivations  were  the  most  important  for  these  volunteers. 
They  also  differentiated  between  two  types  of  volunteers,  i.e.  serious  mappers 
and  casual  mappers,  based  on  the  number  of  contributions,  the  length  of  the 
contributions  or  the  frequency  of  contributions.  The  results  of  the  survey  of  the 
444  OSM  volunteers  was  that  two  extrinsic  factors,  i.e.  community  and  the  pro¬ 
ject  goal,  and  the  intrinsic  factors  of  unique  ethos  and  altruism  were  the  most 
important.  However,  casual  mappers  ranked  unique  ethos  as  more  important 
than  serious  mappers.  Other  important  factors  included  the  importance  of  local 
knowledge  (instrumentality  and  self-efficacy),  the  freedom  to  provide  infor¬ 
mation  where  one  wanted,  trust  in  the  system  and  fun.  Serious  mappers  also 
positively  rated  learning  as  a  motivation,  and  in  a  much  stronger  manner  than 
casual  mappers  did.  Understanding  these  motivations  can  provide  strategies 
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Table  1:  Motivational  factors  for  VGI  (adapted  from  Budhathoki,  2010). 


Type 

Factor 

Relation  to  VGI 

Intrinsic 

Unique  ethos 

Maps  should  be  freely  available  as  an  open  public  good 

Learning 

Gaining  new  knowledge  about  mapping  and  places 

Personal 

enrichment 

Satisfaction  in  contributing 

Self- 

Appreciation  of  talents  and  skills  in  mapping  and  of 

actualisation 

local  knowledge 

Self-expression 

Ability  to  express  skills  and  knowledge  of  mapping  and 
local  areas 

Self-image 

Gaining  confidence  in  self  through  contributions 

Fun 

Enjoying  the  process  of  contributing  and  seeing 
contributions  online 

Recreation 

Mapping  outdoors 

Instrumentality 

Providing  critical  inputs  to  a  map  that  would 
otherwise  be  wrong  or  missing  information 

Self-efficacy 

Feeling  of  being  effective  through  contributions 

Meeting  own 

Filling  gaps  in  spatial  information  needed  for  different 

needs 

applications 

Freedom  of 
expression 

Ability  to  choose  what  information  to  provide  and  how 

Altruism 

Contributions  to  a  social  cause 

Extrinsic 

Career 

Contributions  become  part  of  a  CV  or  lead  to 
marketable  skills 

Strengthening 

Creating  strong  bonds,  e.g.  through  mapping  parties 

social  relations 

or  other  socially  constructed  events 

Project  goal 

Alignment  between  goals  of  the  project  and  those  of 
the  contributor 

Community 

Being  part  of  a  bigger,  sustaining  community 

Identity 

Becoming  part  of  a  group,  e.g.  advancing  to  an  expert 
group 

Reputation 

Recognition  from  the  system  or  individuals  in  the 
community 

Monetary 

Being  paid  for  contributions  or  making  money  from 

return 

the  data 

Reciprocity 

The  idea  that  if  you  contribute,  others  will  contribute 

System  trust 

Will  contribute  if  there  is  trust  in  the  system 

Networking 

Contributing  forms  networks  locally  and  internationally 

Socio-political 

Contributing  meets  socio-political  motivations 
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to  turn  casual  mappers  into  more  serious  ones,  e.g.  ways  that  may  help  build 
confidence  and  emphasising  the  importance  and  strengths  of  local  knowledge. 

In  a  separate  study  by  Tiwari  et  al.  (2010),  a  survey  of  motivations  was  under¬ 
taken  with  volunteers  in  OSM  and  the  GISCorps.  The  top  motivational  fac¬ 
tors  in  both  groups  were  found  to  be  altruism,  personal  satisfaction  and  gain¬ 
ing  new  geospatial  knowledge.  Other  factors  from  Table  1  were  also  chosen, 
including  strengthening  of  social  relationships  and  fun.  Participants  were  also 
asked  what  incentives  they  would  like  to  receive  in  order  to  increase  participa¬ 
tion.  Around  one  quarter  replied  that  no  incentives  were  needed,  while  another 
quarter  wanted  additional  geospatial  training.  Composto  et  al.  (2016)  consid¬ 
ered  the  need  to  provide  something  back  to  the  volunteers  as  a  motivator:  they 
examined  two  VGI  initiatives,  and  found  that  the  one  that  had  more  visible 
impact,  i.e.  the  one  that  resulted  in  broken  streetlights  being  reported  and  fixed, 
was  the  one  that  has  had  longevity  and  sustained  participation. 


3  Best  Practices  in  Volunteer  Recruitment, 

Motivation  and  Retention 

To  attract  volunteers  to  contribute  to  a  VGI  initiative,  there  are  three  key  issues 
to  consider: 

•  What  methods  should  be  used  to  recruit  participants? 

•  How  will  the  volunteers  be  motivated  to  contribute  given  all  the  different 
motivational  factors  that  have  been  identified  through  empirical  research? 

•  How  can  participation  be  maintained  in  the  long  term? 

Past  initiatives  have  already  considered  many  of  these  issues,  so  this  section  pre¬ 
sents  different  approaches  that  have  been  taken  in  practice.  In  fact  much  of  the 
good  practice  in  volunteer  recruitment,  motivation  and  retention  stems  from 
citizen  science  initiatives,  i.e.  the  involvement  of  citizens  in  scientific  research 
(Bonney  et  al.,  2009).  Broader  than  VGI,  citizen  science  is  widespread  in  areas 
such  as  biodiversity  monitoring  (Hyvoenen  et  al.,  2013;  Clavero  and  Revilla, 
2014)  and  astronomy  (Clery,  2011).  Although  citizen  science  is  not  specifically 
geographic  in  nature,  there  are  lessons  valuable  to  VGI  that  have  been  learned 
from  numerous  citizen  science  projects,  some  of  which  are  presented  below. 


3.1  Recruitment 

The  guidance  document  written  by  Tweddle  et  al.  (2012)  provides  different 
recruitment  strategies  for  citizen  science  projects,  where  the  starting  point  is  to 
determine  the  target  audience,  e.g.  whether  the  project  is  targeted  to  the  general 
public,  to  map  lovers,  to  school  children,  etc.  The  promotion  and  recruitment 
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process  can  then  be  tailored  towards  this  group  using  a  range  of  channels,  includ¬ 
ing  email,  social  media  and  the  press.  Experiences  from  Nature’s  Notebook,  a  citi¬ 
zen  science  project  in  the  USA  to  collect  phenology  data  (i.e.  life  stage  data)  from 
plants  and  animals,  have  shown  the  necessity  to  carefully  identify  target  audiences 
and  then  to  contact  them  with  messages  that  are  focused  on  explaining  the  per¬ 
sonal  benefits  of  contributing  (Crimmins  et  al.,  in  press).  Natures  Notebook  had 
little  success  when  advertising  its  programme  to  the  general  public  so  instead 
targeted  the  members  of  another  citizen  science  initiative  with  similarly  rigor¬ 
ous  protocols  for  data  collection,  and  this  has  been  a  very  successful  method  of 
recruitment  for  the  project. 

Holding  a  launch  event  or  side  event  at  existing  conferences,  workshops  and 
festivals  can  be  an  effective  way  of  informing  potential  volunteers  about  the 
aims  of  the  project,  about  why  their  help  is  important  and  about  what  they  will 
gain  from  the  project.  The  project  goal  was  ranked  highly  as  a  motivator  for 
OSM  (Budhathoki  and  Haythornthwaite,  2012),  so  communicating  this  aspect 
is  clearly  important  for  attracting  volunteers. 

Composto  et  al.  (2016)  examined  the  use  of  media  campaigns  to  recruit  vol¬ 
unteers  in  two  VGI  projects.  They  showed  that  this  is  a  very  effective  way  of 
bringing  individuals  to  the  website  but  that  contributions  decreased  rapidly 
after  the  intervention,  indicating  that  the  use  of  the  press  has  limited  influence 
over  time;  thus  other  methods  need  to  be  used  in  combination  with  the  media 
to  continually  stimulate  recruitment. 

OSM  uses  mapping  parties  as  a  way  of  recruiting  new  individuals  and  pro¬ 
viding  social  contact  with  other  OSM  mappers  while  serving  the  purpose  of 
increasing  map  coverage  in  a  particular  area  (OSM,  2015).  An  interesting  study 
by  Hristova  et  al.  (2013)  showed  that  mapping  parties  did  increase  the  amount 
of  data  collected  during  the  event  and  did  result  in  greater  contributions  after 
the  event,  generally  for  light  to  medium  contributors  in  the  short-term  and 
heavy  contributors  in  the  longer-term.  Mapping  parties  also  retained  more 
experienced  users  but  failed  to  retain  newcomers,  possibly  because  it  was  more 
difficult  for  them  to  integrate  socially  in  an  already  established  community; 
thus  more  focus  on  integration  of  novices  at  these  events  is  recommended, 
as  well  as  more  emphasis  on  easy-to-use  tools  and  on  the  fun  aspect.  Similar 
events  could  be  organised  for  other  VGI  initiatives,  using  the  experience  gained 
by  the  OSM  community  in  running  these  events. 

Another  way  of  recruiting  volunteers  is  to  make  explicit  links  to  education, 
motivating  students  to  take  part  in  VGI  initiatives.  Some  of  the  current  part¬ 
nerships  between  mapping  agencies  and  schools  are  described  by  Olteanu- 
Raimond  et  al.  (2017)  in  Chapter  13  and  by  Bol  et  al.  (2016).  A  very  successful 
example  of  citizen  science  linking  to  education  is  the  GLOBE  (Global  Learning 
and  Observations  to  Benefit  the  Environment)  Program,  which  was  initiated 
by  Al  Gore  in  1995.  The  programme  aims  to  increase  environmental  awareness 
by  actively  involving  students  in  science,  including  through  mapping.  Similarly, 
integrating  volunteer  service  directly  into  educational  programmes  is  another 


1 00  Mapping  and  the  Citizen  Sensor 


effective  way  to  recruit  and  motivate  individuals.  There  are  many  examples  of 
this  in  the  conservation  arena,  such  as  the  Master  Naturalist  Programs  or  the 
Conservation  Stewards  Programs  established  in  different  US  states  (Van  Den 
Berg  et  al.,  2009)  that  provide  individuals  with  a  certification  and  require  a  cer¬ 
tain  number  of  volunteer  hours,  both  as  part  of  the  certification  and  to  keep  the 
certification  once  it  has  been  gained.  This  type  of  approach  could  be  modified 
to  include  mapping  as  a  volunteer  activity  and  could  encourage  longer  term 
engagement. 


3.2  Motivation  and  Retention 

Nielsen  (2006)  provides  some  general  advice  for  improving  participant  equality 
(i.e.  increasing  the  numbers  that  actively  contribute)  in  social  media  and  online 
communities  that  also  has  relevance  for  VGI.  The  first  recommendation  is  to 
make  it  as  simple  as  possible  to  contribute.  This  is  already  implemented  in  OSM 
in  the  sense  that  users  are  free  to  choose  what  features  and  in  what  location 
they  contribute  to  OSM;  furthermore,  this  was  highlighted  as  one  of  the  main 
motivators  for  contributing  to  OSM  in  the  study  by  Budhathoki  and  Haythorn- 
thwaite  (2012).  Part  of  this  recommendation  also  refers  to  the  design  of  the  site 
and  the  ease  of  use,  which  can  clearly  influence  participation.  The  Zooniverse 
citizen  science  project  has  put  a  considerable  amount  of  effort  into  the  design 
of  its  projects  and  much  can  be  learned  from  its  approach  (Prestopnik,  n.d.). 
Zooniverse  now  offers  a  platform  to  host  other  citizen  science  projects,  allow¬ 
ing  new  initiatives  to  benefit  from  its  design  principles  while  also  having  access 
to  a  large  community  of  citizen  scientists;  new  VGI  initiatives  should  consider 
this  option  of  working  with  Zooniverse. 

Another  relevant  recommendation  from  Nielsen  (2006)  is  to  make  partici¬ 
pation  part  of  another  activity  so  that  volunteers  do  not  find  the  act  of  con¬ 
tributing  a  burden.  Passive  data  collection  from  communities  such  as  hikers 
and  bikers  or  from  geotagged  repositories  are  some  examples  that  could  be 
harnessed  within  VGI  applications;  alternatively,  gamification,  or  the  addition 
of  game  mechanics  to  applications  (Deterding,  2012),  can  lower  the  burden  of 
participation  while  adding  an  element  of  fun,  which  is  another  key  motivator 
for  participation  in  VGI  (Budhathoki  and  Haythornthwaite,  2012;  Tiwari  et  al., 
2010).  An  example  of  gamification  is  the  Ingress  augmented  reality  game  by 
Google,  where  players  gather  spatial  information  that  is  then  used  to  update 
Google  Maps  as  a  side  task  to  the  main  goal  of  the  game,  which  is  to  find 
portals  (Carney,  2012).  Gamification  has  also  been  shown  to  help  motivate 
participation  in  a  citizen  science  application  such  as  Project  Budburst,  which 
developed  the  Biotracker  app  for  gathering  phenology  data:  use  of  technol¬ 
ogy  such  as  smartphones,  coupled  with  competitive  elements  such  as  badges 
and  leaderboards,  was  shown  to  appeal  to  the  younger  ‘Millennial’  audience 
(Bowser  et  al.,  2013).  A  number  of  game  apps  have  been  built  for  gathering 
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OSM  data,  e.g.  AddressHunter,  which  is  a  role  playing  game  that  also  involves 
adding  addresses  to  the  OSM  database,  and  Kort  Game,  for  adding  new  fea¬ 
tures  to  OSM  (OSM,  2013). 

Motivation  is  also  clearly  linked  to  maintaining  participation  in  the  longer 
term.  The  use  of  different  incentives  can  be  a  powerful  way  to  achieve  this. 
Reputation  and  confidence  building  measures  can  be  effective  ways  to  motivate 
volunteers.  The  citizen  science  project  iNaturalist,  for  example,  awards  different 
levels  of  expertise  to  volunteers,  from  novice  to  expert,  which  recognises  their 
knowledge  and  degree  of  contribution.  Each  observation  is  also  given  a  stamp 
of  quality,  which  can  build  confidence  in  the  contributors,  particularly  when 
the  observations  are  considered  to  be  of  research  grade  quality.  This  follows  the 
advice  of  Nielsen  (2006)  to  promote  high-quality  contributions.  In  Wikipedia, 
contributors  can  take  on  roles  with  increasing  responsibilities  within  the  com¬ 
munity,  including  arbitration  and  administration  (Bryant  et  al.,  2005),  which  is 
also  a  reputation  and  confidence  building  measure. 

Another  incentive  is  related  to  the  impact  of  contributions.  In  OSM,  con¬ 
tributors  can  quickly  see  their  changes  on  the  map,  which  acts  as  an  important 
form  of  visual  feedback.  Correcting  areas  and  filling  in  missing  information 
can  provide  a  form  of  satisfaction  that  acts  as  a  motivating  factor;  thus  the 
design  of  VGI  initiatives  should  include  good  visual  displays  (Budhathoki  and 
Haythornthwaite,  2012).  Experiences  from  Natures  Notebook  with  regards 
to  retention  have  highlighted  the  need  to  provide  frequent  communication 
to  volunteers,  acknowledge  the  value  of  their  contributions  on  a  regular  basis 
and  show  that  their  contributions  are  being  used  (Crimmins  et  al.,  in  press). 
Nature’s  Notebook  relies  heavily  on  digital  communication  of  various  forms, 
ensuring  that  the  content  of  the  communication  is  information-rich,  including 
summaries  of  publications  that  have  used  the  data,  which  are  communicated 
in  simple  language.  Finally,  the  project  provides  different  opportunities  for  vol¬ 
unteers  to  participate,  which  are  based  on  problem  solving  approaches  to  keep 
volunteers  engaged  over  time. 

Rewarding  volunteers  in  other  ways  can  also  be  an  effective  approach  for 
encouraging  and  supporting  participation.  A  reward  system  can  be  imple¬ 
mented  in  several  different  ways;  for  example,  Estes  et  al.  (2016)  have  used 
Amazon’s  Mechanical  Turk  to  do  cropland  mapping  through  digitisation  of 
fields  for  part  of  South  Africa  using  performance-based  micro-payments. 
Maps  with  91%  accuracy  were  produced,  and  the  authors  calculated  that  a 
detailed  cropland  map  for  all  of  Africa  could  be  created  with  2  to  3  million 
USD  and  the  crowd.  Several  campaigns  have  been  run  using  the  Geo-Wiki 
tool  for  visualisation,  validation  and  crowdsourcing  of  land  cover  (Fritz  et  al., 
2012;  See  et  al.,  2015),  where  incentives  have  ranged  from  Amazon  vouchers 
to  co-authorship  on  a  scientific  publication.  However,  Nielsen  (2006)  makes 
the  point  that  participants  should  not  be  over-rewarded  as  this  might  encour¬ 
age  the  most  active  volunteers  to  dominate  and  thereby  disincentivise  others 
from  contributing. 
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4  Case  Studies 

This  section  describes  a  set  of  case  studies  based  on  our  experiences  to  illustrate 
different  ways  in  which  volunteers  have  been  motivated  to  contribute  VGI  to 
different  applications. 


4.1  Mapping  Parties 

As  mentioned  previously,  mapping  parties  are  intended  to  map  a  specific  area 
over  a  short  period  of  time  while  introducing  newcomers  to  VGI.  This  case 
study  describes  experiences  with  two  mapping  parties  that  were  organised  as 
social  events  for  delegates  at  the  recent  FOSS4G  (Free  and  Open  Source  Soft¬ 
ware  for  Geomatics)  Europe  conference1,  held  in  July  2015  at  the  Politecnico 
di  Milano,  Como  Campus  (Figure  1).  The  first  mapping  party  was  a  traditional 
OSM  one,  while  the  second  focused  on  indoor  mapping.  To  recruit  partici¬ 
pants,  the  mapping  party  organisers  presented  their  ideas  and  calls  for  par¬ 
ticipation  during  the  opening  session  of  the  conference.  Information  about  the 
events  was  also  communicated  over  social  media,  via  the  official  conference 
website  and  via  OSM  in  order  to  attract  and  sustain  participation  throughout 
the  conference. 

The  OSM  mapping  party  was  designed  and  set  up  by  a  small  number  of 
active  OSM  contributors  who  were  attending  the  conference  (Mooney  et  al., 
2015);  their  goal  was  to  collect  Points  of  Interest  (POIs)  that  were  missing  in 
Como  city.  Around  40  participants  (roughly  10%  of  the  conference)  attended 


Fig.  1:  Photographs  from  the  mapping  parties  at  the  FOSS4G  2015  Europe 
conference. 
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and  were  taught  how  to  collect  the  data  using  field  papers,  which  are  a  specific 
service  to  print  out  OSM  maps  for  annotation  in  the  field.  The  POIs  were  then 
mapped  in  around  2.5  hours.  On  the  second  day  of  the  conference,  there  was  a 
data  upload  session  that  showed  the  volunteers  how  to  insert  their  data  into  the 
OSM  database;  this  session  was  too  short,  so  not  all  data  were  entered  into  the 
database  during  the  event.  However,  the  POIs  were  monitored  after  the  event 
and  showed  an  increased  mapping  over  the  summer,  which  is  attributed  largely 
to  this  particular  mapping  party  as  local  OSM  activity  in  the  city  is  not  large. 
Thus,  the  mapping  party  motivated  interested  individuals  by  providing  them 
with  training  and  a  social,  community-based  atmosphere  in  which  to  collect 
and  upload  the  data.  Given  the  increase  in  POIs  over  the  summer,  this  may 
have  led  to  some  individuals  continuing  to  contribute  to  OSM. 

The  second  mapping  party  was  focused  on  indoor  mapping,  which  is  some¬ 
thing  new  compared  to  the  more  traditional  OSM  outdoor  mapping  parties. 
The  main  purpose  of  the  event  was  to  raise  awareness  of  the  scientific,  techni¬ 
cal  and  practical  challenges  associated  with  indoor  mapping.  The  IndoorGML 
standard  was  used  to  collect  the  navigation  pathways  through  rooms  and  in 
connecting  spaces.  The  indoor  mapping-party  received  attention  from  the  local 
television  and  more  than  30  participants  took  part  in  the  event.  Almost  all  of 
the  mappers  generated  data,  but  only  some  of  them  contributed  to  the  result, 
mainly  due  to  technical  issues  and  shortage  of  time.  The  overall  result  was  a  sin¬ 
gle,  merged  navigable  graph  of  two  floors  of  the  University  building  (Figure  2). 


Fig.  2:  Screenshot  of  the  merged  navigation  graph  from  the  participants  of  the 
Indoor  Mapping  Party  held  at  the  FOSS4G  2015  Europe  Conference. 
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The  indoor  mapping  party  produced  positive  results  as  novices  learned  about 
the  concepts,  strategies,  problems  and  tools  for  mapping  indoor  spaces  while 
the  researchers  and  developers  received  feedback  on  the  techniques  and  tools 
used  during  the  event. 

Overall,  the  mapping  parties  were  inclusive  and  friendly  experiences  and  are 
recommended  as  side  events  at  future  FOSS4G  conferences.  At  both  parties, 
the  incentive  was  the  social  aspect,  i.e.  spending  time  together,  learning  some¬ 
thing  new,  making  a  useful  social  contribution  and  having  fun.  An  additional 
incentive  was  offered,  i.e.  prizes  were  given  to  the  top  three  contributors  at  the 
closing  ceremony  of  each  event.  Thus  both  mapping  parties  appealed  to  a  range 
of  intrinsic  and  extrinsic  motivations.  Both  events  were  successful  in  attract¬ 
ing  participants,  and  the  OSM  mapping  party  may  have  led  to  the  recruitment 
of  new  participants  in  OSM  that  continued  to  contribute  to  OSM  beyond  the 
actual  event.  The  indoor  mapping  party  was  more  focused  on  the  learning  ele¬ 
ment  as  motivator.  The  main  disadvantage  associated  with  both  mapping  par¬ 
ties  was  time,  e.g.  there  was  insufficient  time  to  complete  the  uploading  of  POIs 
from  the  paper-based  surveys,  and  this  had  to  be  completed  by  the  mapping 
party  staff  after  the  event. 


4.2  Gamification 

4.2.1  Cropland  Capture  and  Picture  Pile 

As  mentioned  previously,  a  number  of  Geo-Wiki  crowdsourcing  campaigns 
have  been  organised  in  the  past  to  collect  data  on  land  cover  (See  et  al.,  2015). 
Although  these  campaigns  were  successful,  we  wanted  to  investigate  gamifica¬ 
tion  as  a  way  to  attract  larger  numbers  of  participants  and  thereby  collect  more 
data  to  improve  global  land  cover  maps.  Cropland  Capture  was  the  first  serious 
game  developed  by  the  Geo-Wiki  team  as  a  simplified  version  of  the  previous 
applications.  The  interface  was  designed  to  be  mobile  as  well  as  desktop-based, 
running  on  browsers,  smartphones  and  tablets  (for  both  iOS  and  Android 
operating  systems).  The  game  was  launched  in  mid-November  2013  and  ran 
until  the  beginning  of  May  2014.  As  part  of  the  game  the  players  were  presented 
with  a  red  rectangle  encircling  satellite  imagery  or  photographs,  as  shown  in 
Figure  3a.  Players  were  then  asked  to  determine  if  there  was  any  evidence  of 
cropland  in  the  image  contained  within  the  rectangle.  The  interface  for  mobile 
devices  was  designed  such  that  players  swiped  the  images  into  three  possible 
categories:  Yes,  No  or  Maybe.  For  each  correct  answer,  the  player  received  a 
single  point,  while  one  point  was  deducted  for  incorrect  answers.  Correctness 
was  determined  through  majority  agreement,  although  there  was  an  option  to 
challenge  the  crowd  if  the  player  felt  that  they  had  been  incorrectly  penalised. 

Recruitment  was  through  the  Geo-Wiki  newsletter,  a  press  release,  social 
media  and  word  of  mouth.  The  game  received  media  coverage  at  two  different 
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occasions  during  the  time  it  was  open,  which  resulted  in  a  spike  in  participation; 
however,  participation  decreased  soon  afterwards,  similarly  to  that  observed  by 
Composto  et  al.  (2016).  The  game  had  a  leader  board,  which  was  reset  each  week, 
and  the  top  three  players  in  terms  of  the  total  number  of  classifications  each  week 
were  added  to  a  prize  draw  that  took  place  at  the  end  of  the  game’s  six-month 
period;  thus,  prizes  were  one  incentive  used  to  motivate  the  players.  The  idea 
of  helping  science  was  also  a  strong  message  in  the  game  and  was  meant  as  an 
additional  motivating  factor.  In  total,  more  than  4.5  million  observations  were 
obtained  from  more  than  3,000  players.  A  survey  of  players  was  undertaken  near 
the  end  of  the  game,  which  revealed  that  helping  science,  the  competitive  element 
and  the  beauty  of  the  satellite  images  were  motivating  factors  for  participation. 

Picture  Pile  is  the  direct  successor  to  Cropland  Capture,  so  the  game  mechan¬ 
ics  are  similar.  However,  Picture  Pile  was  made  more  generic:  the  basic  concept 
is  that  players  sort  or  classify  ‘piles  of  pictures’,  where  each  pile  represents  a  dif¬ 
ferent  task  or  theme  including  different  land  cover  types.  The  idea  behind  hav¬ 
ing  different  tasks  in  the  game  is  that  there  will  be  more  variety  for  the  players, 
which  may  help  to  retain  them  for  longer.  Another  major  difference  between 
Picture  Pile  and  Cropland  Capture  is  the  added  functionality  for  change  detec¬ 
tion:  in  Picture  Pile,  players  are  presented  with  pairs  of  images  from  different 
time  periods  and  asked  to  look  for  evidence  of  change  over  time,  e.g.  defor¬ 
estation  (see  Figure  3b).  Players  can  also  view  a  map  of  their  contributions  and 
the  contributions  of  others  in  real-time.  Another  added  feature  is  the  use  of 
more  reference  data,  where  the  images  have  been  marked  up  to  explain  correct 
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Fig.  3:  (a)  Cropland  Capture  and  (b)  Picture  Pile. 
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answers.  This  is  used  as  both  feedback  and  training  for  the  players,  which  was 
also  intended  to  provide  motivation  to  participate.  Each  pile  has  its  own  leader 
board  and  a  chat  channel,  which  makes  it  very  easy  for  the  players  and  the 
organisers  to  communicate  with  each  other  as  the  game  progresses. 

Recruitment  strategies  were  similar  to  Cropland  Capture.  The  game  was 
launched  in  November  2015.  Almost  4  million  pairs  of  pictures  were  classified. 
Other  piles  will  be  implemented  in  the  future. 


4.2.2  FotoQuest  Austria 

The  second  game,  called  FotoQuest  Austria,  is  quite  different  in  nature  from 
Cropland  Capture  and  Picture  Pile:  instead  of  asking  the  crowd  to  classify 
imagery  online,  the  FotoQuest  Austria  app  is  focused  on  getting  players  to  go 
outside  and  document  the  landscape.  The  game  is  similar  to  geocaching  except 
that  players  do  not  search  for  a  physical  cache.  Instead,  points  are  awarded 
for  documenting  specific  locations  shown  on  the  mobile  device  (see  Figure  4). 
Players  are  asked  to  take  photographs  in  four  cardinal  directions  and  then  clas¬ 
sify  the  land  cover  and  land  use  based  on  categories  in  a  classification  system 
developed  for  the  EU  LUCAS  (Land  Use  and  Cover  Area  frame  Survey)  survey. 
This  EU  systematic  sample  is  collected  by  professional  surveyors  every  three 
years  in  EU  countries  for  change  detection  purposes,  among  other  reasons,  and 
therefore  provides  authoritative  data  for  comparison  with  the  crowd’s  results. 
The  locations  of  the  LUCAS  points  for  Austria  were  added  to  the  FotoQuest 
Austria  app  along  with  other  locations  to  ensure  sufficient  numbers  of  points 
for  the  players  to  visit. 

The  app  was  specifically  designed  to  adhere  as  closely  as  possible  to  the 
LUCAS  protocol,  and  so  only  allows  photographs  to  be  taken  when  the  user 
is  within  a  certain  distance  of  the  location,  the  mobile  device  is  not  tilted, 
the  compass  indicates  the  correct  direction  and  the  horizon  matches  a  line 
indicated  on  the  app.  This  was  to  ensure  that  the  data  collected  by  the  players 
would  be  of  the  highest  quality  possible,  but  also  to  make  data  collection  as 
easy  as  possible.  The  app  was  launched  in  July  2015  and  ran  over  a  three-month 
period. 

Recruitment  was  via  a  newsletter,  social  media  and  a  more  traditional  media 
campaign,  i.e.  a  press  release  was  issued  and  interviews  were  held  with  the  main 
television  and  radio  stations  in  Austria.  The  app  was  featured  as  ‘app  of  the 
week’  in  the  technology  section  of  the  website  of  Austria’s  main  TV  channel 
and  was  featured  on  an  afternoon  programme  which  demonstrated  how  the 
app  worked.  In  addition  to  the  fun  provided  by  the  competitive  elements  of  the 
game,  additional  motivators  were  interacting  with  the  landscape  and  incentives 
such  as  smartphones  and  tablets,  which  were  awarded  at  the  end  of  the  game. 
Overall,  2300  quests  were  undertaken.  A  second  version,  which  was  developed 
using  feedback  received  from  the  game,  will  be  launched  in  2017. 
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Fig.  4:  (a)  Quests  in  FotoQuest  Austria  (b)  Classifying  land  cover  (c)  Geotagged  photograph  of  the  point. 
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4.2.3  The  Land  Cover  Validation  Game 

The  Land  Cover  Validation  Game  is  a  serious  game  for  validating  land  cover 
(Brovelli  et  al,  2015).  Figure  5  shows  the  user  interface,  in  which  players  see  a 
reference  image  of  the  land  under  investigation.  The  task  is  to  classify  the  30  m 
pixel  shown  within  a  blue  box  on  the  interface.  Depending  on  the  answer,  the 
players  get  points,  badges  and  a  ranking  on  a  global  leaderboard.  The  game 
was  introduced  at  the  FOSS4G  2015  Europe  Conference  and  participants 
played  the  game  during  the  week  of  the  conference.  There  were  68  participants 
engaged  for  a  total  of  more  than  20  hours  of  gameplay.  Overall  1600  pixels  were 
validated.  A  video2  summarising  the  Land  Cover  Validation  Game  results  was 
presented  at  the  ESA  Earth  Observation  Open  Science  event  in  October  2015. 
Prizes  were  offered  as  additional  incentives  at  the  end  of  the  FOSS4G  2015 
Europe  Conference.  The  results  showed  that  involving  users  in  a  crowdsourc¬ 
ing  validation  campaign  with  a  gaming  incentive  can  be  an  effective  way  to 
collect  data  and  to  resolve  disagreements  between  two  conflicting  land  cover 
classifications. 


4.3  Embedding  VGI  in  Education 
4.3.1  Work  Training  in  High  Schools 

Work  training  in  schools,  which  is  strongly  supported  by  recent  school  reforms 
in  Italy,  combines  classroom  studies  with  training  in  the  skills  required  to 
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Fig.  5:  Land  Cover  Validation  Game  interface,  with  a  pixel  (blue  square  box)  to 
be  classified  (http://bit.ly/foss4game). 
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make  a  successful  transition  from  high  school  to  employment,  and  hence  is 
aimed  at  students  aged  15  and  above.  Every  year  since  2013,  the  Politecnico  di 
Milano  has  organised  a  week-long  internship  for  15-20  students;  the  incen¬ 
tives  for  the  students  to  participate  are  credits  towards  their  course,  learning 
new  technologies  and  the  collection  of  useful  VGI.  The  collection  of  data  is 
preceded  by  a  MOOC3  called  M’appare  il  mondo  (which  is  a  word  play  in  Ital¬ 
ian,  as  it  means  ‘the  world  appears  to  me’,  but  becomes  ‘mapping  the  world’  if 
the  apostrophe  is  removed)  and  instructions  on  how  to  create  a  mobile  app  to 
collect  the  data.  This  latter  step  has  been  done  using  two  applications.  The  first 
is  the  Open  Data  Kit  (ODK),  which  is  a  simple,  free,  open  tool  for  the  Android 
operating  system;  it  is  very  easy  to  implement  forms  in  ODK  for  managing  the 
collection  of  data,  i.e.  attributes,  photos,  videos,  audio  of  the  selected  features, 
etc.  The  second  was  Geopaparazzi4,  which  is  another  free,  user-friendly,  open 
source  tool. 

During  one  work  training  session,  the  students  developed  an  app  to  collect 
data  on  building  amenities,  e.g.  the  presence  of  ramps  and  stairs  (Figure  6). 
The  results  from  the  data  collection  exercise  were  then  displayed  on  a  website5 
so  that  the  students  could  view  their  contributions  online  directly  (Figure  7), 
including  those  features  that  do  not  conform  to  Italian  law,  simultaneously 
raising  an  issue  of  importance  for  the  public.  During  another  session,  students 
built  an  app  to  capture  local  biodiversity  (Figure  8). 

In  addition  to  gaining  credits,  the  students  learn  how  to  map  the  world 
around  them  and  collect  data  that  are  of  public  interest,  which  are  displayed 
through  a  WebGIS  interface.  In  the  future  there  are  plans  to  make  connections 
between  the  data  needs  of  government  municipalities  and  of  civil  protection 
agencies  and  the  projects  undertaken  by  the  students,  which  should  provide 
additional  motivation  to  become  involved  in  VGI  projects. 


4.3.2  Humanitarian  MiniMapathons  in  Elementary  Schools 

Mapathons,  also  known  as  ‘armchair’  mapping,  are  events  where  people  come 
together  to  do  mapping  online.  Examples  are  events  related  to  natural  disasters 
and  political  crises,  which  are  supported  and  organised  by  HOT  (Humanitarian 
OSM  Team),  or  events  devoted  to  mapping  places  that  are  not  yet  well  mapped 
or  where  the  most  vulnerable  people  live,  e.g.  the  Missing  Maps  project.  Two 
MiniMapathons  aimed  at  10-year-old  children  from  elementary  schools  were 
organised  by  the  Geomatics  and  Earth  Observation  (GEO)  and  Hypermedia 
Open  Center  (HOC)  Labs  of  the  Politecnico  di  Milano  with  the  support  of 
HOT  and  Missing  Maps.  The  first  event,  in  which  36  children  took  part,  was 
organised  in  Como.  The  second  event,  in  Milan,  saw  212  children  participate. 
Online  registration  for  the  second  event  closed  just  a  few  hours  after  opening, 
having  reached  the  maximum  number  of  students  that  could  be  accommo¬ 
dated  in  the  computer  rooms  of  the  Politecnico. 
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Fig.  6:  Example  of  screenshots  from  ODK  to  collect  information  on  building  amenities. 
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Fig.  7:  The  WebGIS  interface  showing  building  amenities  in  the  city. 
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Fig.  8:  Examples  of  screenshots  from  the  ODK  forms  developed  for  the  biodiversity  app  (selection  of  the  species,  information  about 
the  selected  species,  geolocation). 
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The  purpose  of  the  MiniMapathons  was  to  map  buildings  in  the  northern¬ 
most  part  of  Swaziland  in  a  project  related  to  malaria  elimination.  In  total  5000 
buildings  were  mapped  and  the  quality  was  similar  to  that  of  adult  volunteers’ 
in  terms  of  the  shapes  digitised  and  the  ability  to  recognise  buildings  on  the 
imagery.  The  teachers  of  the  elementary  schools  and  the  children  were  highly 
motivated  as  they  saw  this  as  a  tangible  way  of  helping  people  in  Swaziland, 
but  at  the  same  time  the  children  acquired  competencies  in  mapping,  geom¬ 
etry  and  informatics.  The  second  incentive  for  participation  was  a  purely  sym¬ 
bolic  one,  i.e.  certificates  of  participation  and  baseball  caps  from  Politecnico  di 
Milano.  The  two  events  were  highly  successful  and  appear  to  be  a  good  way  to 
transform  children  into  neogeographers  and  humanitarians  and  to  lead  them 
to  contribute  VGI  for  a  good  cause. 


5  Conclusions 

The  success  of  VGI  is  clearly  down  to  the  participation  of  volunteers  and  of  the 
community  that  supports  the  activities  related  to  spatial  data  collection  and 
mapping.  Hence  volunteer  recruitment,  motivation  and  longer-term  retention 
are  key  issues  when  designing  and  implementing  a  VGI  initiative.  A  number  of 
studies  have  looked  at  typologies  for  characterising  the  nature  of  volunteers  and 
the  motivational  factors  that  drive  participation.  These  factors,  which  were  com¬ 
piled  by  Budhathoki  and  Haythornthwaite  (2012),  represent  a  comprehensive  list 
of  motivations  that  can  be  used  to  further  investigate  reasons  for  participation  in 
current  VGI  initiatives.  They  can  also  be  used  in  the  design  of  new  applications, 
drawing  upon  the  findings  of  Budhathoki  and  Haythornthwaite  (2012)  for  OSM 
volunteers.  Recommendations  and  best  practice  in  recruitment,  motivation  and 
retention  were  then  provided,  drawing  upon  experiences  in  the  broader  field  of 
citizen  science.  The  case  studies  presented  here  served  to  illustrate  how  recruit¬ 
ment  and  motivation  are  considered  in  a  range  of  different  VGI  initiatives. 
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Abstract 

Today  almost  any  kind  of  User  Generated  Content  (UGC)  can  be  situated  within 
a  geographic  context.  Volunteered  Geographic  Information  (VGI)  can  include 
many  types  of  UGC,  such  as  georeferenced  photographs,  social  media  and  text, 
geographic  data  themselves,  etc.  There  are  legal,  privacy  and  ethical  issues  raised 
by  VGI,  and  at  present  these  are  not  very  well  studied  or  understood  despite  the 
rise  in  popularity  of  VGI.  This  chapter  will  discuss,  investigate  and  define  some 
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of  the  most  prominent  issues  related  to  the  legal,  privacy  and  ethics  topic  within 
VGI.  The  chapter  argues  that  these  issues  are  not  well  understood  by  all  of  the 
actors  in  VGI,  and  in  particular  by  the  producers  of  this  information  as  well  as 
the  users  or  consumers  of  this  new  data  source.  Creating  a  better  understanding 
of  these  issues  will  be  very  important  in  the  future  development  and  evolution 
of  VGI  in  society. 


Keywords 

Data  privacy,  ethics,  legal  issues,  Volunteered  Geographic  Information 


1  Introduction 

The  public  collection  and  exchange  of  geospatial  data  and  information  as  Vol¬ 
unteered  Geographic  Information  (VGI)  involve  many  privacy,  legal  and  ethi¬ 
cal  issues  (Blatt,  2015).  These  issues  are  exacerbated  with  the  further  distribu¬ 
tion  and  dissemination  of  these  data  by  third  parties  such  as  libraries,  online 
data  services,  etc.  In  many  examples  of  VGI,  the  collection  of  geographic  data 
involves  the  use  of  location-based  devices  that  record  the  identities,  positions 
and  movements  of  the  contributors  of  the  information.  Other  examples  of 
VGI,  such  as  social  media,  can  embed  geographic  position  into  imagery,  video, 
sound,  text,  message  data,  etc.  These  data  and  information  objects  can  then  be 
accessed  by  other  citizens,  systems  and  services.  As  crowdsourced  geographic 
information  becomes  more  prevalent  in  society  today,  more  detailed  spatial 
data  are  constantly  being  collected  from  citizens,  particularly  through  the  pro¬ 
liferation  of  spatially  aware  devices  such  as  smartphones,  smart  devices  and 
sensors.  The  major  issue  developing  here  is  that  these  sources  of  spatial  data  can 
be  combined  or  linked  to  other  databases  and  data  sources  and  can  potentially 
expose  sensitive  private  information,  such  as  the  personal  data,  living  habits 
and  health  conditions  of  the  citizen  contributor  themselves  (Shen  et  al.,  2016). 
The  further  usage,  storage  and  integration  of  these  data  are  often  the  subject  of 
complex  legal  and  ethical  considerations. 


1.1  The  role  of  the  citizen  within  privacy,  legal  and 
ethical  issues  in  VGI 

In  this  chapter  we  consider  the  position  of  the  citizen  and  the  VGI  that  they 
can  generate,  and  we  discuss  the  privacy,  legal  and  ethical  issues  relating  to 
the  production  of  this  VGI  and  its  further  usage.  In  VGI  projects  and  activi¬ 
ties  the  citizen  is  at  the  very  core  of  almost  all  aspects  of  VGI  data  production, 
management,  dissemination  and  usage.  Yet  we  argue  in  this  paper  that  there 


Considerations  of  Privacy,  Ethics  and  Legal  Issues  in  Volunteered  Geographic  Information  1 2 1 


is  still  a  large  gap  in  our  understanding  of  the  privacy,  legal  and  ethical  issues 
connected  to  these  activities.  VGI  is  still  a  relatively  new  field  of  research;  sub¬ 
sequently  there  is  not  a  great  deal  of  published  knowledge  or  guidelines  avail¬ 
able  on  these  issues  in  VGI. 

Although  VGI  tends  to  be  associated  with  the  collection  and  supply  of  explic¬ 
itly  geographic  material,  such  as  OSM  (see  Chapters  3  and  4  -  Mooney  and 
Minghini,  2017;  Touya  et  al.,  2017)  or  citizen  science  projects  (see  Chapters  1 
and  2  -  Foody  et  al.,  2017;  See  et  al.,  2017),  it  is  certainly  not  limited  to  this  type 
of  materials.  As  means  of  a  short  motivating  example,  we  consider  geotagged 
photographs.  Geotagged  photographs  are  not  associated  explicitly  with  VGI,  in 
the  sense  that  geotagging  has  become  so  implicit  with  the  use  of  smartphones 
that  most  citizens  may  not  be  aware  of  this  feature,  i.e.  that  our  holiday  photo¬ 
graphs,  for  example,  are  being  geotagged  when  we  take  them  and  upload  them 
to  various  social  media  sites.  In  this  case,  this  information  is  volunteered  pas¬ 
sively  (Fast  and  Rinner,  2014),  without  realizing  that  it  is  actually  geographic 
information  nor  that  it  can  be  reused  and  integrated  with  other  geographic 
information.  Indeed  many  citizens  are  not  aware  that  when,  for  example,  we 
contribute  geotagged  photographs  to  a  citizen  science  project,  one  cannot 
always  predict  what  the  downstream  future  usages  of  those  photographs  will 
be  given  the  myriad  of  mashup  tools  and  technologies  available.  Overall  this 
means  that  although  crowdsourced  geographic  information  can  be  both  vol¬ 
unteered,  as  in  VGI,  or  harvested  in  a  passive  or  ambient  way  (Stefanidis  et  al., 
2013),  for  the  most  part  citizens  are  not  fully  aware  of  the  additional  intelli¬ 
gence  that  can  be  elicited  by  the  powerful  combinations  of  software,  cloud  com¬ 
puting  and  data  processing  technologies  available  today.  Dienlin  and  Trepte 
(2015)  emphasise  that  even  though  citizens  today  have  substantial  concerns 
with  regard  to  their  online  privacy,  they  are  often  engaged  in  self-disclosing 
behaviours  that  do  not  adequately  reflect  their  concerns.  It  is  therefore  neces¬ 
sary  to  attempt  to  highlight  the  types  of  privacy,  ethical  and  legal  issues  that  can 
be  faced  knowingly  or  unknowingly  by  citizens  involved  in  VGI  today. 

The  remainder  of  this  chapter  is  organised  as  follows.  In  Section  2  we  provide 
a  brief  discussion  of  the  current  understanding  of  the  issues  of  privacy,  ethi¬ 
cal  and  legal  frameworks  in  VGI  today  by  considering  simple  actor/use  case 
scenarios.  In  the  three  sections  that  follow  it,  we  discuss  privacy  (Section  3), 
ethics  (Section  4)  and  legal  issues  (Section  5).  In  Section  6  we  summarise  the 
paper  with  some  concluding  remarks  while  highlighting  future  directions  for 
this  work. 


2  Positioning  the  Issues  of  Privacy,  Ethics  and  Legality  in  VGI 

At  the  time  of  writing,  the  issues  of  privacy,  ethics  and  legality  in  VGI  have 
not  received  widespread  or  in-depth  treatment  by  the  research  community.  The 
exact  nature  of  the  VGI  or  data  used  and  which  use  case  it  is  applied  to  may 
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help  to  determine  which  legal,  ethical  and  privacy  issues  are  most  prominent. 
When  information  about  individual  citizens  is  transferred  and  presented  within 
a  geographic  context,  the  resulting  profile  information  could  be  both  ‘highly 
revelatory  and  involuntary’  (Scassa,  2013:5),  and  this  can  raise  important  pri¬ 
vacy  and  ethical  issues.  The  ability  for  VGI  data  and  information  to  be  mashed 
up  or  integrated  with  other  VGI  datasets,  proprietary  datasets  or  other  informa¬ 
tion  sources  means  that  new  sources  of  data  are  created.  The  privacy,  ethics  and 
legal  issues  that  existed  for  the  original  VGI  dataset  may  not  have  completely 
changed  due  to  this  transformative  change.  In  this  section,  we  provide  a  sim¬ 
ple  table  (Table  1)  that  situates  privacy,  ethics  and  legal  issues  for  the  principal 
actors  involved  in  the  collection,  production  and  dissemination  of  VGI,  namely 
citizens,  national  mapping  agencies  (NMAs),  commercial  companies,  research¬ 
ers  and  other  entities  such  as  small  and  medium-sized  enterprises  (SMEs). 
While  this  table  is  not  a  fully  comprehensive  overview  of  all  of  the  possible 
actor  interactions  with  privacy,  ethics  and  legal  issues,  it  will  allow  us  to  situate 
our  discussions  in  the  subsequent  sections  of  this  chapter.  Each  cell  in  the  table 
provides  a  simple  example  of  considerations  that  are  made  by  the  correspond¬ 
ing  actor  when  producing,  collecting,  managing,  using  or  disseminating  VGI. 

As  we  can  see,  there  is  some  overlap  in  the  table.  All  of  the  actors  will  con¬ 
front  and  deal  with  many  of  the  same  privacy,  ethics  and  legal  issues  but  they 
will  respond  to  these  issues  differently.  For  example,  how  an  NMA  deals  with 
the  liability  and  legal  aspects  of  VGI  will  be  different  to  how  an  academic 
researcher  deals  with  the  same  problem.  With  these  examples  in  mind  we  will 
now  look  at  privacy  (Section  3),  ethics  (Section  4)  and  legal  issues  (Section  5) 
in  the  next  three  sections. 


3  Privacy  Issues 

Privacy  is  probably  the  most  well  known  aspect  of  the  three  issues  considered  in 
this  chapter;  protecting  it  is  very  important,  and  this  is  no  different  when  con¬ 
sidering  VGI.  Privacy  of  user  data  and  information  should  be  considered  in  the 
initial  design  of  VGI  systems,  as  adding  privacy  protection  to  existing  systems 
can  be  very  cumbersome,  and  this  is  no  different  for  VGI  systems  and  projects. 


3.1  Understanding  Privacy  within  the  VGI  context 

Private  data  in  the  VGI  context  are  any  geographic  data  or  information  that  can 
be  linked  to  an  individual  contributor  who  created,  collected  or  edited  those 
data.  Thus,  to  prevent  VGI  data  being  used  to  violate  the  privacy  of  individuals, 
we  need  to  look  at  the  character  of  the  data  and  investigate  the  entire  process 
from  the  collection  of  data  to  the  submission  of  the  VGI  to  data  repositories, 
and  then  onwards  to  the  usage  of  the  data.  The  most  efficient  measure  is  not  to 


Table  1:  Privacy,  ethics  and  legal  issues  for  actors  involved  in  the  collection,  production  and  dissemination  of  VGI. 
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collect  private  data  at  all  or  at  least  not  to  collect  data  that  are  linkable  to  indi¬ 
viduals.  If  linkable  private  data  are  collected,  it  then  becomes  necessary  to  set  up 
protection  mechanisms  to  ensure  that  the  data  are  only  used  according  to  the 
original  purpose  defined  before  the  collection  of  the  VGI  started.  As  VGI  data 
collections  are  considered  a  resource  for  new  and  maybe  unforeseen  usages  and 
research,  it  becomes  all  the  more  important  that  these  data  do  not  provide  link¬ 
able  private  data  about  individuals.  The  question  that  must  be  asked  is  whether 
location  information  in  itself  is  private  data  or  can  be  linked  to  individuals: 
the  answer  depends  on  the  location  accuracy.  Many  location  data  are  accurate 
enough  to  be  bound  to  one  individual  or  to  a  small  group  of  individuals,  e.g. 
an  office  or  home,  and  are  sometimes  even  combined  with  precise  time  and 
date.  There  is  no  one-size-fits-all  solution  here;  the  collection  of  point-based 
geographic  data  for  a  specific  purpose  may  need  to  have  high  geographic  accu¬ 
racy.  With  this  requirement  for  accuracy  comes  a  possibility  that  the  geographic 
features  close  to  the  collected  points  could  be  used  to  infer  other  information. 


3.2  Approaches  to  Privacy  Preservation  in  VGI 

The  guiding  principle  of  privacy  protection  is  to  collect  as  little  private  data  as 
possible.  Cho  (2014)  argues  that  there  must  be  privacy  and  legal  protection  for 
volunteers  in  VGI  data  collection  and  projects,  otherwise  ‘the  ensuing  litigation 
may  destroy  the  VGI  model  before  it  reaches  its  full  potential’.  Calderoni  et  al. 
(2015)  remark  that  we,  as  citizens,  are  only  starting  to  grasp  the  privacy  risks 
associated  with  the  constant  tracking  of  our  whereabouts  by  the  very  devices 
that  we  carry  around  with  us.  In  order  to  continue  using  location-based  ser¬ 
vices  in  the  future  without  compromising  personal  privacy  and  security,  there 
is  an  urgent  need  for  privacy- friendly  applications  and  protocols. 

There  exists  some  literature  related  to  privacy  concerns  and  possible  solu¬ 
tions  related  to  VGI.  There  are  a  number  of  prevalent  technological  approaches, 
including  perhaps  the  popular  approach  of  blurring  or  fuzzing  information 
from  its  original  data  (Tuther  et  al.,  2009).  Anonymising  data  and  selectively 
revealing  information  according  to  volunteer  preference  is  another  approach 
(Kim  et  al.,  2013).  In  the  Geographic  Privacy-Aware  Knowledge  Discovery 
and  Delivery  (GeoPKDD)  project,  Giannotti  and  Pedreschi  (2008)  investigated 
various  scientific  and  technological  issues  of  mobility  data,  open  problems  and 
roadmaps.  They  found  that  privacy  issues  related  to  Information  and  Com¬ 
munications  Technology  (ICT)  can  only  be  addressed  through  an  alliance  of 
technology,  legal  regulations  and  social  norms.  In  the  meanwhile,  increasingly 
sophisticated  privacy- preserving  data  mining  techniques  are  being  studied  and 
need  to  be  further  developed.  These  approaches  aim  to  achieve  appropriate  lev¬ 
els  of  anonymity  by  means  of  controlled  transformation  of  data  and/or  patterns 
with  limited  distortion,  to  avoid  the  undesired  side  effects  on  privacy  while 
preserving  the  possibility  of  discovering  useful  patterns  and  trends. 
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The  most  common  question  asked  about  privacy  in  VGI  is  whether  data  col¬ 
lection  services  and  systems  can  be  enhanced  so  that  the  spatial  data  collected 
or  generated  by  a  contributor  cannot  be  traced  back  to  that  individual  con¬ 
tributor.  The  contributor  should  not  be  identifiable  through  their  contributions 
to  a  VGI  project;  more  precisely,  the  contributor  should  be  identifiable  within 
the  VGI  project  (such  as  through  a  pseudonym  username  in  a  project)  but  their 
contribution  should  not  be  linkable  to  the  personal  and  private  data  and  infor¬ 
mation  for  their  actual  person.  There  is  a  need  to  consider  the  sensitivity  of 
the  privacy  issues  within  contributions  to  VGI:  are  there  situations  where  a 
contributor  would  prefer  not  to  be  linked  to  a  set  of  contributions  or  a  sin¬ 
gle  contribution?  In  the  capture  of  aerial  imagery,  geotagged  photographs  and 
street -level  photography,  people  can  also  potentially  be  identifiable  as  subjects. 
There  are  thus  many  privacy  issues,  and  these  issues  have  not  been  adequately 
addressed  as  of  yet. 


3.3  Privacy  for  non-human  subjects  in  VGI 

Privacy  can  also  be  related  to  non-human  subjects  in  VGI.  Suppose  there  is 
a  crowdsourcing  or  VGI  campaign  in  the  area  of  biodiversity  and  a  very  rare 
or  precious  plant  species  is  found  and  geolocated.  To  protect  this  species  (and 
potentially  its  habitat),  this  information  needs  to  be  kept  private.  But  other 
species  identified  by  the  campaign  may  not  need  privacy.  This  example  could 
also  extend  to  similar  scenarios  for  a  geological  survey.  Suppose  a  contributor 
identifies  the  potential  location  of  a  precious  metal;  there  might  be  very  good 
reasons  related  to  why  this  location  and  find  must  be  kept  private.  The  discus¬ 
sions  above  for  both  human  privacy  and  the  privacy  of  non-human  subjects 
raises  the  question  of  the  need  to  have  manual  checking  of  contributions  for 
these  privacy  issues:  is  it  necessary  to  moderate  contributions  for  their  privacy 
characteristics  and  not  just  their  data  quality  aspects?  The  moderation  question 
in  VGI  already  raises  many  obstacles  to  its  implementation  (Neis  and  Zielstra, 
2014).  It  might  not  be  possible  to  automate  this  process  to  include  the  consid¬ 
eration  of  privacy  aspects. 

While  the  focus  above  has  been  on  the  individual  VGI  contributor,  it  is  often 
the  case  that  contributors  to  VGI  projects  are  institutions  and  organisations 
that  provide  datasets  for  VGI;  institutions  or  organisations  must  also  be  aware 
of  and  familiar  with  the  licence  terms  within  which  they  provide  content. 


4  Ethics  Issues 

As  far  back  as  the  work  of  Mitchell  and  Draper  (1983),  the  issue  of  ethics 
has  been  subject  to  research  conversation  in  geography.  In  their  work,  they 
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indicate  that  geographers  have  not  always  been  sensitive  to  ethical  issues,  and 
that,  as  geography  researchers,  one  has  to  balance  the  obligations  of  under¬ 
standing  and  knowledge  with  those  of  respecting  the  dignity  and  integrity  of 
research  subjects. 


4.1  Key  Ethical  Issues  in  VGI 

In  VGI,  the  citizens  who  collect,  manage  and  work  with  the  data  are  very  often 
the  subject  of  research.  Little  work  has  been  carried  out  specifically  on  eth¬ 
ics  in  VGI.  Many  studies  on  contributors  have  been  performed  and  published 
in  the  literature  in  the  last  few  years  (Granell  and  Ostermann,  2016).  Hartter 
et  al.  (2013)  outline  that  ethical  standards  in  science  require  that  research  with 
human  subjects  respect  individuals,  commit  to  nondisclosure  of  participants’ 
identities,  minimise  potential  harm  and  ensure  that  the  benefits  and  burdens  of 
research  be  fairly  distributed,  and  that  subjects  be  informed  of  the  full  nature 
of  the  research  so  they  can  decide  against  participation  if  they  wish.  Ethical 
standards  and  plans  now  usually  require  ethics  approval  funding  review  boards 
and  research  authorities.  Luppicini  (2010)  introduces  the  term  technoethics 
to  refer  to  an  interdisciplinary  study  of  technological  impacts  on  the  morals 
and  ethics  in  a  society.  Ethical  conduct  and  social  responsibility  are  important 
factors  within  contemporary  society  to  maintain  respect  and  harmony.  Lingel 
and  Bishop  (2014)  consider  the  ‘labour  ethics’  surrounding  VGI  in  terms  not 
only  of  what  is  technically  possible,  but  of  what  is  also  ethically  responsible. 
The  authors  argue  that  the  introduction  of  ethical  considerations  should  not 
discourage  the  production  of  VGI  within  volunteer  communities;  rather,  those 
involved  in  instigating  this  VGI  or  managing  it  must  give  careful  consideration 
to  how  these  communities  are  managed. 

Ethical  considerations  can  be  performed  by  both  the  data  producer  (the  vol¬ 
unteers)  and  the  users  (VGI  project  coordinator/platform  operator).  As  before, 
the  volunteers  have  to  consider  and  adopt  an  ethical  approach  to  their  report¬ 
ing  of  information  and  data.  For  example,  in  a  disaster  or  crisis  situation,  this 
involves  not  engaging  in  the  false  reporting  of  damage,  casualties,  fatalities,  etc. 
Indeed,  ethical  considerations  must  be  given  by  volunteers  to  information  and 
data  that  they  provide  that  can  lead  to  the  action  of  authorities  such  as  emer¬ 
gency  services  (Haworth  and  Bruce,  2015).  Volunteers  wilfully  contributing 
false  or  misleading  data  or  information  not  only  undermine  the  VGI  project 
in  which  they  are  involved,  but  also  causes  a  further  lack  of  trust  and  suspicion 
from  users  about  the  quality  and  usability  of  VGI  in  general.  From  the  coordi¬ 
nator  side,  the  volunteer  must  be  made  aware  of  the  purpose  of  the  project  that 
they  are  volunteering  for;  voluntary  submissions  must  not  be  used  for  com¬ 
mercial  purposes,  or  shared  with  other  entities  for  different  purposes  without 
the  consent  of  the  volunteers.  At  this  point,  it  is  clear  that  the  consideration  of 
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ethics  combines  the  issues  of  data  privacy  and  the  legal  aspects  of  VGI  -  these 
issues  are  not  easily  disengaged  from  each  other. 


4.2  Summary  of  Ethical  Issues 

As  communicated  by  Sula  (2016),  the  key  ways  to  respect  ethics  in  data- 
based  research  include  involving  participants  throughout  the  research  pro¬ 
cess,  avoiding  collecting  information  that  should  remain  private,  notifying 
participants  of  their  inclusion  and  providing  them  with  options  to  correct 
or  delete  personal  information,  and  using  public  channels  to  disseminate 
research,  such  as  Open  Data.  Ethical  research  has  the  least  possible  impact 
on  subjects,  asking  or  collecting  only  as  much  as  is  needed  to  answer  its  ques¬ 
tions.  In  the  case  of  VGI  research,  the  researchers  involved  may  not  know 
exactly  what  knowledge  they  are  trying  to  extract  or  patterns  they  are  trying 
to  uncover;  the  data  are  being  used  in  an  exploratory  way.  In  these  circum¬ 
stances,  it  seems  nearly  impossible  to  inform  participants  of  all  anticipated 
harms  and  benefits  in  advance. 

Today,  datasets  collected  through  VGI  and  crowdsourced  means  have  a 
potentially  very  long  lifespan.  Given  the  longevity  of  these  datasets  and  their 
potential  interoperability  and  integration  with  other  datasets,  researchers  and 
scientists  must,  in  general  and  where  possible,  avoid  data  with  personally  iden¬ 
tifiable  information  or  information  that  could  later  be  used  to  identify  partici¬ 
pants  in  connection  with  other  datasets,  e.g.  screennames,  usernames,  etc.  The 
potential  for  unintended  consequences  are  high,  but  entirely  mitigated  when 
no  personally  identifiable  information  is  collected  in  the  first  place  (Sula,  2016). 
The  integration  of  many  datasets  with  each  other  creates  a  brand  new  dataset 
that  is  essentially  an  unknown  quantity  in  terms  of  its  ethical  characteristics. 
In  this  situation  the  creators  of  these  new  datasets  must  be  conscious  of  how 
the  new  dataset  will  be  used,  distributed,  analysed  and  even  itself  potentially 
integrated  with  other  datasets  in  the  future. 


5  Legal  Issues 

In  Olteanu-Raimond  et  al.  (2017),  one  of  the  six  obstacles  described  for  NMAs 
in  using  VGI  is  the  legal  issue.  The  most  relevant  of  these  legal  issues  in  using 
VGI  are  intellectual  property  and  liability.  With  the  new  trend  of  open  data, 
more  and  more  public  bodies  have  adopted  a  policy  of  open  data.  Generally 
there  are  two  concepts  of  open  data:  one  concept  means  that  ‘data  and  content 
can  be  freely  used,  modified,  and  shared  by  anyone  for  any  purpose’  and  the 
other  involves  open  source  licensing  applied  on  software.  Intellectual  property 
concerns  both  data  producers  and  users.  From  the  producers’  point  of  view,  it 
defines  ownership  rights  of  the  data,  licences,  and  how  data  can  be  used  and 
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under  which  conditions.  From  the  users’  point  of  view,  it  defines  rules  to  enrich 
and  disseminate  the  data. 


5.1  Liability  as  a  Legal  Issue  in  VGI 

Concerning  liability,  the  main  question  is  that  of  who  is  liable  and  under  what 
circumstances  if  harm  is  caused,  economic  loss  happens  or  incorrect  deci¬ 
sions  are  taken.  This  issue  is  linked  closely  to  the  concerns  with  data  quality, 
i.e.  precision  and  accuracy.  Liability  can  be  different  from  country  to  country 
and  from  product  to  product.  When  crowdsourced  data  are  used  by  a  legally 
mandated  organisation  such  as  an  NMA,  what  are  the  implications  for  that 
organisation?  Does  the  NMA  take  all  of  the  legal  responsibility?  Is  there  any 
citizen  responsibility?  Should  there  be?  Indeed,  Cho  (2014:10)  argues  that  there 
must  be  legal  protection  for  volunteers  in  VGI  data  collection  and  projects, 
otherwise  ‘the  ensuing  litigation  may  destroy  the  VGI  model  before  it  reaches 
its  full  potential’.  Rak  et  al.  (2012)  studied  the  integration  of  VGI  into  Canadian 
authoritative  datasets  from  the  liability  point  of  view  by  proposing  four  primary 
risk  management  techniques  to  manage  risks  resulting  such  an  incorporation. 
One  of  the  most  important  and  difficult  of  these  risk  management  techniques 
sees  the  information  provider  being  required  to  show  that  steps  were  taken  to 
ensure  the  accuracy  of  VGI  that  has  been  integrated  into  their  data. 


5.2  Legal  Issues  Surrounding  Data  Licence  Types 

The  type  of  licence  applied  to  VGI  data  for  their  subsequent  dissemination  has 
an  important  influence  on  their  usage.  There  are  three  main  types  of  open  data 
licences: 

•  Share  alike  licences,  which  require  the  derived  datasets  to  be  released  with 
the  same  licence  as  the  original  one(s);  the  most  famous  such  licence  in  the 
area  of  geographic  information  is  the  Open  Database  License  (ODbL)  used 
by  OpenStreetMap  (OSM). 

•  Open  licences,  which  allow  any  type  of  use  provided  the  citation  of  the  data 
provider  is  given;  it  allows,  for  instance,  commercial  use  of  derived  datasets. 
An  example  of  such  a  licence  is  the  French  ‘Licence  ouverte’,  which  is  used 
to  release  governmental  open  data  in  France. 

•  Limited  use  open  licences,  which  limit  the  use  of  the  dataset  to  personal 
use,  or  non-commercial  use.  For  instance,  the  IGN  (the  French  mapping 
agency)  releases  its  datasets  openly  for  research  and  education  purposes. 

The  choice  of  a  licence  conveys  a  political  or  commercial  strategy,  and  the  strat¬ 
egies  of  these  licences  might  not  be  compatible.  So  what  happens  when  projects 
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with  different  strategies  plan  to  merge  their  datasets?  And  what  happens  when 
one  or  more  of  these  datasets  are  from  VGI?  It  is  useful  at  this  point  to  provide 
a  real-world  example.  The  most  typical  case  regarding  geographic  informa¬ 
tion  is  the  following:  how  is  it  possible  to  integrate  non-ODbL  open  data  into 
OSM?  The  case  of  the  French  national  address  dataset  is  interesting  to  study,  as 
it  plans  to  integrate  data  from  the  IGN,  which  is  a  governmental  administra¬ 
tion,  the  French  Post  Office  company,  which  is  a  public  limited  company,  and 
OSM  (Figure  1).  All  three  already  have  address  datasets  updated  by  crowd¬ 
sourcing  communities.  They  also  have  different  licensing  strategies.  OSM  uses 
the  ODbL  while  the  French  Post  Office  would  prefer  a  licence  that  allows  com¬ 
mercial  use  of  derived  datasets.  Figure  1  shows  a  possible  integration  scenario 
for  the  architecture  of  the  project  and  the  licensing  strategy.  Two  new  datasets 
are  created  in  this  scenario:  a  common  and  central  address  dataset,  and  a  copy 
of  this  dataset  using  the  OSM  technologies  (in  RDF  format).  The  OSM-like 
copy  is  under  the  ODbL  licence,  which  allows  OSM  contributions  regarding 
addresses  to  be  directly  included,  and  the  other  way  around.  The  common 
address  dataset  is  under  two  licences:  a  limited  open  licence  that  only  allows 
personal  and  non-commercial  use  of  the  data,  and  a  charged  licence  for  other 
uses.  The  OSM-like  dataset  is  only  a  partial  copy,  as  the  French  Post  Office 
does  not  want  to  release  all  the  information  of  its  dataset  (e.g.  the  standardised 
spelling  of  addresses) .  A  quality  control  step  is  included  in  the  common  dataset 


Fig.  1  :  Possible  architecture  to  mix  licences  and  dissemination  strategies 
between  OSM,  the  IGN  and  private  companies. 
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to  improve  contributions  through  both  field  survey  (by  mail  carriers  and  IGN 
surveyors)  and  automatic  tools. 

In  this  scenario,  different  access  desks  are  proposed  for  citizens,  derived  from 
existing  tools.  The  IGN  desk,  which  fills  the  common  address  dataset,  is  dedi¬ 
cated  to  community- sourcing  (from  city  administrations,  firefighters,  police 
officers,  etc.);  the  Post  Office  desk,  which  also  fills  the  common  address  dataset, 
is  dedicated  to  citizens  and  administrations  that  report  updates  on  addresses; 
and  the  OSM  desk  is  based  on  OSM  software,  such  as  iD1,  and  could  fill  both 
the  common  dataset  and  the  OSM-like  dataset.  The  tricky  part  of  the  integra¬ 
tion  scenario  is  that  the  contributions  go  to  both  datasets  at  the  same  time,  so  it 
is  not  ‘infected’  by  ODbL.  This  architecture  seeks  to  attract  OSM  contributors 
to  this  project,  but  the  contributors  should  accept  that  their  contribution  will 
fill  both  address  datasets,  which  have  different  licences. 


5.3  Summary  of  Legal  Issues  in  VGI 

In  summary,  the  legal  issues  in  VGI  must  be  considered  from  the  side  of  both 
the  data  producers  or  collectors  (i.e.  the  volunteers  or  citizens)  and  the  users 
or  facilitators  (i.e.  VGI  project  management,  VGI  data  portal  operators)  of  the 
data.  From  the  position  of  the  volunteer,  their  legal  role  and  their  contribution 
may  not  always  be  clearly  defined  and  this  can  lead  to  potentially  exposing 
them  to  legal  problems.  On  the  other  hand,  if  a  data  provider  or  data  portal 
only  facilitates  the  transfer  or  access  to  VGI  data,  then  who  carries  the  legal 
responsibilities  related  to  consequences  of  future  use  of  these  data?  For  exam¬ 
ple,  submissions  from  volunteers  to  a  VGI  project  may  indicate  natural  hazards 
in  a  particular  location  or  the  vulnerabilities  of  a  property.  This  (potentially 
false)  information  could  be  used  by  an  insurance  company  to  raise  insurance 
premiums.  Then,  from  the  VGI  project  coordinators’  side,  to  what  extent  must 
a  portal/project  coordinator  provide  a  disclaimer  about  legal  aspects?  Under 
what  circumstances  can  a  portal  be  held  liable  for  omissions  (e.g.  damaged 
areas  not  mapped  during  a  disaster),  or  mistakes  (e.g.  infrastructure  shown  to 
be  intact  that  is  actually  broken,  leading  to  inaccessibility)  be  challenged?  In 
reality,  there  are  no  clear  cut  answers  to  these  questions  at  this  point  in  time. 
Christin  et  al.  (2011)  indicate  that  the  research  community  should  provide 
open  datasets  that  can  serve  as  a  baseline  for  performance,  security  and  legal 
evaluation  in  order  to  begin  addressing  these  critical  issues. 


6  Conclusions  and  Future  Directions 

In  this  chapter  we  have  provided  a  brief  overview  and  discussion  of  privacy, 
ethics  and  legal  issues  in  the  production,  collection,  storage,  dissemination  and 
integration  of  VGI.  These  are  complex  issues.  As  VGI  continues  to  grow  rapidly 
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in  terms  of  popularity  amongst  contributors  and  as  an  alternative  or  comple¬ 
mentary  source  of  spatial  data  for  researchers,  authoritative  agencies,  commer¬ 
cial  companies,  etc.,  these  issues  will  become  more  prevalent  and  urgent.  In 
their  study  of  privacy  concerns  in  the  use  of  location-based  services  such  as 
social  media,  Fodor  and  Brem  (2015)  found  that  privacy  concerns  do  influ¬ 
ence  citizen  adoption  of  these  services  but  that  the  answer  is  more  complex 
and  multi-faceted  than  just  a  simple  case  of  trusting  such  services.  Even  now, 
with  VGI,  new  technologies  are  emerging  all  of  the  time,  offering  citizens  new 
and  exciting  ways  to  generate  and  collect  spatial  data.  Luppicini  and  So  (2016) 
argue  that  in  technologies  such  as  the  use  of  drones  for  collecting  data  and 
information,  a  lack  of  understanding  of  the  factors  of  ethics  and  privacy  often 
causes  the  prohibition  of  the  use  of  these  technologies.  A  lack  of  understand¬ 
ing  does  not  often  really  mitigate  the  issues,  but  can  hinder  the  development  of 
devices  and  technologies  that  can  be  used  in  many  positive  ways. 

When  VGI  is  collected  and  subsequently  disseminated,  it  can  be  reused,  dis¬ 
played,  integrated  and  transformed  in  a  myriad  of  ways.  The  model  for  under¬ 
standing  what  happens  with  data  once  they  are  released  by  the  individual,  or 
what  this  means  on  an  aggregate  scale,  is  thus  fluid  and  uncertain  (Hallman 
et  al.,  2012).  In  reality,  citizens  often  have  a  poor  basis  on  which  to  form  a  picture 
of  the  data  relationships,  the  consequences  and  the  issues  in  VGI.  Citizens  often 
struggle  to  comprehend  how  these  issues  add  to  the  importance  of  these  data 
flows  in  relation  to  other  social  structures  or  issues.  Hallinan  et  al.  (2012:271) 
go  on  to  argue  that  due  to  the  complexity  of  the  issues  of  privacy,  ethics  and 
legality,  ‘it  appears  that  the  public  are  being  forced  to  act  in  an  environment  they 
have  little  template  for  approaching’.  The  concepts  of  VGI  and  Open  Data  are 
still  relatively  new.  Consequently,  it  will  take  time  for  citizens  to  become  deeply 
familiar  with  the  issues  discussed  above. 

Christin  et  al.  (2011)  argue  that  at  the  moment,  privacy  research  usually 
operates  on  either  private  or  synthetic  datasets.  These  datasets  do  not  allow 
new  mechanisms  for  privacy,  ethical  and  legal  considerations  to  be  harmonised 
or  benchmarked  against.  In  any  case,  Torra  and  Navarro- Arribas  (2014:277) 
indicate  after  their  wide  scale  review  of  the  issues  of  data  privacy  online  that 
the  development  of  methods  to  protect  citizens  ‘has  to  take  into  account  the 
specificities  of  the  data  involved’.  No  two  VGI  datasets  are  the  same;  indeed,  it 
can  be  the  case  that  within  a  VGI  dataset  different  objects  might  be  collected 
by  different  citizens  in  different  circumstances.  VGI  is  an  exciting  and  power¬ 
ful  source  of  geospatial  data  that  is  likely  to  continue  growing.  Understanding 
how  to  protect  the  citizen  while  enhancing  their  role  in  the  production  of  VGI 
is  a  big  research  challenge  for  the  next  few  years.  Indeed  this  research  issue 
has  not  really  been  tackled  at  all  by  the  research  community  at  this  point  in 
time.  Protection  of  the  citizen’s  privacy  and  ethical  rights  under  suitable  legal 
conditions  is  very  important.  However,  the  frameworks  or  structures  devel¬ 
oped  to  implement  these  protections  must  not  place  insurmountable  barriers 
to  citizen  participation  in  VGI.  The  act  of  being  involved  in  VGI  as  citizens 
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should  continue  to  be  a  leisure  activity  pursued  by  those  motivated  to  volun¬ 
teer.  There  is  a  fine  balance  between,  on  the  one  hand,  encouraging  and  foster¬ 
ing  participation  in  VGI  activities  and,  on  the  other  hand,  ensuring  that  the 
complex  issues  of  privacy,  ethics  and  legality  are  understood  and  adhered  to  by 
a  potentially  large  cohort  of  individuals  (Rak  et  al.,  2012;  Torra  and  Navarro- 
Arribas,  2014).  Finding  this  balance  will  have  a  major  influence  on  the  future 
trajectory  of  VGI. 


Notes 


1  http://ideditor.com/ 
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Abstract 

Uncertainty  over  the  data  quality  of  Volunteered  Geographic  Information 
(VGI)  is  the  largest  barrier  to  the  use  of  this  data  source  by  National  Mapping 
Agencies  (NMAs)  and  other  government  bodies.  A  considerable  body  of  litera¬ 
ture  exists  that  has  examined  the  quality  of  VGI  as  well  as  proposed  methods 
for  quality  assessment.  The  purpose  of  this  chapter  is  to  review  current  data 
quality  indicators  for  geographic  information  as  part  of  the  ISO  19157  (2013) 
standard  and  how  these  have  been  used  to  evaluate  the  data  quality  of  VGI  in 
the  past.  These  indicators  include  positional,  thematic  and  temporal  accuracy, 
completeness,  logical  consistency  and  usability.  Additional  indicators  that  have 
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been  proposed  for  VGI  are  then  presented  and  discussed.  In  the  final  section 
of  the  chapter,  the  idea  of  integrated  indicators  and  workflows  of  quality  assur¬ 
ance  that  combine  many  assessment  methods  into  a  filtering  system  is  high¬ 
lighted  as  one  way  forward  to  improve  confidence  in  VGI. 


Keywords 

Spatial  data  quality,  ISO  19157,  positional  accuracy,  thematic  accuracy,  usability 


1  Introduction  and  Background 

Quality  is  a  key  component  of  any  dataset.  Decisions  on  using  a  spatial  data¬ 
set  for  a  certain  purpose  are  heavily  based  on  quality  measures  such  as  posi¬ 
tional  accuracy,  thematic  quality,  completeness  and  usability.  This  also  applies 
to  Volunteered  Geographic  Information  (VGI),  a  new  and  growing  source  of 
data,  contributed  by  citizens,  that  can  take  many  different  forms,  e.g.  geotagged 
photographs  through  sites  such  as  Panoramio  and  Flickr,  online  maps  such  as 
OpenStreetMap  (OSM)  and  Wikimapia,  and  3D  VGI  such  as  OSM-3D  and 
OSM2World.  For  a  more  detailed  overview  of  the  diverse  range  of  current  VGI 
data  sources,  see  Chapter  2  (See  et  ah,  2017). 

A  set  of  elements  is  specified  in  the  ISO  19157  standard  for  spatial  data 
quality  (ISO,  2013).  This  framework  adequately  serves  communities  such  as 
National  Mapping  Agencies  (NMAs),  which  have  professional  staff  follow¬ 
ing  rigorous  protocols  and  multiple  quality  control  processes  so  as  to  produce 
high-quality  products  of  a  minimum  acceptable  specification.  However,  these 
spatial  data  quality  guidelines  have  not  been  developed  with  any  consideration 
of  the  nature  of  VGI.  The  data  quality  of  VGI  brings  new  challenges  into  the 
quality  assessment  field,  and  therefore  it  is  possible  to  consider  VGI  data  qual¬ 
ity  using  this  standard  and  then  recommend  additional  measures  that  take  the 
specific  nature  of  VGI  into  account. 

One  characteristic  of  VGI  is  its  heterogeneous  nature,  e.g.  there  is  often  a 
spatial  bias  in  the  information,  with  more  data  collected  in  urban  than  in  rural 
areas  (Estima  et  ah,  2014;  Neis  and  Zielstra,  2014;  Ma  et  ah,  2015)  or  a  bias 
towards  specific  types  of  features,  influenced  by  the  interests  of  the  volunteers 
(Begin  et  ah,  2013).  Moreover,  even  inside  the  urban  fabric,  the  more  popular 
and  touristic  areas  are  getting  more  attention,  and  thus  more  data  with  higher 
detail,  than  obscure  and  fairly  unknown  urban  areas  (Antoniou  and  Schlieder, 
2014;  Estima  et  ah,  2014).  These  biases  can  be  further  influenced  by  access  to, 
and  knowledge  of,  digital  resources,  the  language  of  the  VGI  application,  cul¬ 
tural  differences  and  how  much  time  users  have  to  participate  (Holloway  et  ah, 
2007;  Zook  and  Graham,  2007). 
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Another  issue  with  VGI  is  the  lack  of  rigorous  data  specifications  of  the  kind 
that  accompany  more  authoritative  Geographic  Information  (GI),  an  issue 
which  can  lead  to  heterogeneous  data  quality  (Hochmair  and  Zielstra,  2012). 
While  collaborative  mapping  can  improve  data  quality  to  a  certain  extent 
(Haklay  et  al.,  2010),  frequent  changes  to  the  same  features  can  deteriorate  the 
overall  quality  and  usability  of  the  data;  examples  of  this  phenomenon  can  be 
found  in  location-based  services  (Mooney  and  Corcoran,  2012)  and  gazetteers 
(Antoniou  et  al.,  2016b).  Moreover,  the  fact  that  there  is  no  standard  way  in 
which  the  data  are  collected,  as  well  as  data  specifications  that  vary  between 
and  also  within  initiatives,  means  that  quality  will  vary  over  space  and  time;  see 
e.g.  OSM,  where  free  tagging  of  features  is  possible. 

For  some  types  of  VGI  applications,  such  as  OSM  or  Instagram,  the  volun¬ 
teers  may  contribute  information  in  any  location.  However,  some  VGI  cam¬ 
paigns  have  been  promoted  with  a  more  specific  objective  in  mind  and  conse¬ 
quently  have  employed  a  statistical  sampling  system  to  make  sure  that  the  data 
are  collected  where  they  are  needed,  that  a  more  global  coverage  is  obtained 
or  that  more  accurate  results  are  achieved.  These  campaigns  have  been  pro¬ 
moted  to  citizen  scientists,  eliciting  their  help  with  specific  goals,  e.g.  quantify¬ 
ing  human  impact  (See  et  al.,  2013)  or  assessing  cropland  and  other  land  use 
area  estimates  (Waldner  et  al.,  2015),  or  even  collecting  photographs  around 
the  world,  such  as  for  the  Degree  Confluence  Project1.  Some  of  the  statistical 
sampling  systems  used  include  systematic  allocation  of  points  in  a  grid;  and 
random  or  stratified  random  samples,  whether  these  are  points,  polygons  or 
pixels.  One  of  the  key  advantages  of  using  statistical  samples  includes  having  a 
stricter  control  on  what  data  the  users  can  contribute  and  where,  allowing  for 
more  straight-forward  measures  of  quality,  e.g.  through  estimation  of  statistical 
uncertainties  and  determination  of  possible  sample  augmentation  to  reduce 
these  uncertainties.  Additionally,  and  depending  on  the  design  of  these  sys¬ 
tems,  comparisons  between  users  are  easier  to  do,  since  the  location  is  fixed 
and  shared  between  the  contributors.  A  key  disadvantage  of  predetermined 
sampling  systems,  however,  might  be  precisely  their  strictness,  e.g.  bounding 
the  users  to  a  pre-defined  set  of  geographic  locations,  with  usually  little  pos¬ 
sibility  of  reporting  local  and  sometimes  more  relevant  characteristics  from  the 
surroundings  that  might  contribute  to  a  better  understanding  and  achievement 
of  a  given  objective;  this,  in  itself,  could  be  detrimental  to  the  quality  of  the 
information  by  providing  information  that  is  very  precise  but  off-target. 

VGI  quality  has  been  the  subject  of  a  considerable  amount  of  research,  par¬ 
ticularly  with  regard  to  the  quality  of  OSM.  For  example,  a  number  of  studies 
have  tried  to  assess  VGI  quality  based  on  comparisons  with  authoritative  data 
provided  by  NMAs  or  commercial  companies  (e.g.  Girres  and  Touya,  2010; 
Haklay,  2010;  Zielstra  and  Zipf,  2010;  Antoniou,  2011;  Estima  and  Painho, 
2013;  Fan  et  al.,  2014).  These  comparisons  are  based  on  the  belief  that  authori¬ 
tative  data  are  always  of  a  minimum,  acceptable  quality  and  created  according 
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to  high  standards  and  that  it  is  thus  reasonable  to  assume  that  authoritative 
data  can  play  the  role  of  reference  datasets  during  a  quality  evaluation  pro¬ 
cess  of  VGI  datasets.  In  these  studies,  a  number  of  methods  are  used,  e.g.  data 
matching,  generalisation  evaluation,  etc.,  that  consider  different  elements  of 
data  quality  such  as  positional  or  thematic  accuracy.  However,  the  application 
of  these  methods  is  not  always  possible,  because  of  limited  data  availability, 
licence  restrictions  or  the  lack  of  access  to  costly  authoritative  datasets.  Moreo¬ 
ver,  as  VGI  datasets  are  often  richer  than  their  authoritative  counterparts,  and 
will  only  continue  to  increase  in  richness,  the  use  of  authoritative  data  as  a  ref¬ 
erence  dataset  for  quality  evaluation  may  no  longer  be  the  most  valid  choice.  In 
some  parts  of  the  world,  VGI  is  more  complete  and  more  accurate  than  author¬ 
itative  datasets  (Neis  et  al.,  2011;  Vandecasteele  and  Devillers,  2015),  which 
poses  challenges  to  the  assessment  of  VGI  data  quality. 

This  chapter  provides  a  review  of  data  quality  indicators  for  geographic  infor¬ 
mation  that  are  part  of  the  ISO  19157  (2013)  standard,  of  how  these  have  been 
used  to  evaluate  the  data  quality  of  VGI  in  the  past  and  of  other  approaches 
that  could  be  used.  Additional  indicators  that  have  been  proposed  for  VGI  in 
particular  are  also  presented,  as  well  as  initiatives  to  develop  quality  assessment 
frameworks  combining  several  quality  measures  and  indicators. 


2  Measures  and  Indicators  to  Assess  VGI  Quality 

ISO  19157  is  the  latest  release  (2013)  of  a  data  quality  standard  among  the  inter¬ 
nationally  known  standards  for  describing  spatial  data  quality,  e.g.  the  Inter¬ 
national  Cartographic  Association  (ICA),  Federal  Geographic  Data  Committee 
(FGDC)  and  Committee  on  Standardization  (CEN)  standards.  It  attempts  to 
define  a  set  of  measures  for  evaluating  and  reporting  data  quality.  The  concep¬ 
tual  model  for  geodata  quality  as  specified  in  ISO  19157  represents  data  quality 
by  a  series  of  data  quality  elements,  e.g.  positional  accuracy.  Each  data  quality 
element  is  then  further  described  by  measures  that  allow  the  data  quality  to  be 
evaluated,  and  the  results  of  the  evaluation  can  be  documented  and  reported 
to  any  interested  party.  The  ISO  19157  standard  does  not  attempt  to  define  any 
minimum  acceptable  levels  of  quality  for  spatial  data,  and  it  considers  only  con¬ 
ventional  datasets  without  proposing  any  data  quality  elements  or  measures 
specific  to  VGI.  The  next  subsection  outlines  the  different  spatial  data  quality 
elements  that  are  part  of  ISO  19157  and  how  they  can  be  used  to  measure  VGI 
quality,  drawing  upon  examples  from  the  literature  and  VGI  practices. 


2.1  ISO  Quality  Measures  Applicable  to  VGI 

The  first  five  spatial  data  quality  elements  ofISO  19157  (Sections  2.1.1  to  2. 1.5) 
are  focused  on  the  quality  of  the  product  from  a  producer’s  point  of  view,  or 
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on  what  is  termed  the  ‘internal  quality’  of  a  dataset  (Devillers  and  Jeansoulin, 
2006).  The  sixth  spatial  data  quality  element  (Section  2.1.6)  is  focused  on  the 
user  needs  and  requirements  and  is  referred  to  as  the  ‘external  quality’  of  a 
dataset  (Devillers  and  Jeansoulin,  2006).  Thus  there  may  be  situations  where 
the  internal  quality  is  high  (i.e.  it  is  produced  according  to  a  set  of  specifica¬ 
tions)  but  the  external  quality  poor  (i.e.  it  does  not  fulfil  a  particular  purpose 
from  a  user’s  perspective).  The  same  will  apply  to  VGI,  so  the  fact  that  a  VGI 
dataset  is  created  according  to  some  initial  specifications  does  not  necessarily 
mean  that  it  can  be  used  to  cover  all  or  any  requirements  stated  by  potential 
end  users.  This  is  of  particular  importance  when  we  consider  that  in  many 
implicit  VGI  sources,  the  existing  specifications  might  have  no  direct  relation 
to  spatial  or  geomatics  aims.  Some  additional  quality  elements  have  been  pro¬ 
posed  for  crowdsourced  data  that  fall  in  between  internal  and  external  quality 
(Meek  et  al.,  2014),  corresponding  to  what  the  authors  called  the  stakeholder 
model;  these  additional  quality  elements  have  also  been  referred  to  as  quality 
indicators  (Antoniou  and  Skopeliti,  2015)  and  are  discussed  in  more  detail  in 
Section  2.2. 


2.1.1  Positional  Accuracy 

Positional  accuracy  refers  to  the  accuracy  of  the  position  of  features  (i.e.  points, 
lines  or  areas)  within  a  spatial  reference  system,  and  is  usually  assessed  by 
comparing  the  position  of  features  with  their  counterparts  in  reference  data, 
which  are  considered  to  represent  the  ‘true’  position.  This  assessment,  however, 
requires  the  existence  of  reference  data  with  similar  characteristics  and  a  valid 
time  frame  to  make  the  comparison. 

The  use  of  portable  data  collection  technologies,  such  as  Global  Naviga¬ 
tion  Satellite  Systems  (GNSS)  receivers  embedded  in  smartphones,  is  one  of 
the  most  common  methods  to  collect  the  geographic  position  associated  with 
crowdsourced  data.  Previously,  these  technologies  were  capable  of  delivering 
a  spatial  precision  exceeding  ±10m  (Coleman,  2010).  However,  the  precision 
is  continuously  improving,  and  accuracies  of  2-3  m  or  even  higher  can  now 
be  achieved,  depending  on  the  receivers  used,  the  observation  method  or  the 
observation  conditions  (Pesyna  et  al.,  2015).  When  combined  with  the  increas¬ 
ing  availability  of  Web-based  maps  and  imagery  (in  some  cases  with  very  high 
spatial  resolution)  that  can  be  used,  for  example,  as  digitising  backdrops,  it  is 
not  surprising  that  the  positional  accuracy  of  VGI  has  increased,  and  is  now 
appropriate  for  a  wide  range  of  applications. 

Several  studies  have  been  conducted  to  assess  the  positional  accuracy  of  VGI 
data.  An  analysis  of  positional  accuracy  of  OSM  in  relation  to  Google  Maps 
and  Bing  Maps  was  undertaken  by  Ciepluch  et  al.  (2010)  for  sites  in  Ireland, 
and  concluded  that  in  some  locations  there  were  differences  of  up  to  10m  (for 
Google  Maps)  between  these  sources,  although  only  for  some  types  of  features, 
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which  seemed  to  result  from  digitisation  over  low-resolution  images.  For  a  set 
of  OSM  road  features  compared  to  the  UK’s  Ordnance  Survey  data,  the  average 
errors  identified  were  5.8m  (Haklay,  2010)  -  a  distance  unlikely  to  be  seriously 
problematic  for  most  land  cover  maps,  but  one  which  could  cause  small  or  nar¬ 
row  features  (ponds,  hedges,  riparian  habitats,  etc.)  to  be  missed  or  misplaced. 
Canavosio-Zuzelski  et  al.  (2013)  performed  a  positional  accuracy  assessment 
of  OSM  as  part  of  a  vector  adjustment  correction.  However,  in  this  case,  rather 
than  accepting  official  survey  data  as  truth,  both  official  data  and  OSM  data 
were  assessed  against  independent  stereo  imagery,  which  means  the  technique 
can  be  applied  to  other  national  agency  and  topographic  datasets  and  has  the 
potential  to  identify  areas  where  the  VGI  surpasses  the  accepted  dataset.  Thus 
the  authors  were  able  to  assess  OSM  against  USGS  (United  States  Geological 
Survey)  and  TIGER  (Topologically  Integrated  Geographic  Encoding  and  Ref¬ 
erencing)  road  data  on  a  more-or-less  equal  footing  -  albeit  for  a  very  small 
area  for  which  the  aerial  imagery  was  available.  In  general,  the  availability  of 
such  accurate  benchmarking  data  is  restricted,  and  this  (or  a  requirement  for 
very  current  information)  may  be  the  very  reason  why  VGI  is  being  elicited. 
The  most  successful  examples  of  such  quality  control  analyses  are  where  feed¬ 
back  is  given  to  the  volunteers  to  enable  them  to  improve  their  contributions, 
e.g.  in  OSM. 

The  positional  accuracy  of  points  representing  geotagged  photographs  may 
also  be  considered  and  analysed,  once  the  specifications  are  available  regard¬ 
ing  what  feature  should  be  positioned.  In  Hochmair  and  Zielstra  (2012),  the 
location  associated  with  the  Flickr  and  Panoramio  photographs  was  com¬ 
pared  to  the  location  of  the  photograph  as  determined  by  the  authors  analys¬ 
ing  what  was  represented  in  the  photograph.  Several  aspects  were  identified 
that  may  influence  positional  quality;  for  example,  the  position  assigned  to 
some  photographs  was  the  location  from  which  the  photograph  was  taken, 
while  for  others  it  was  the  position  of  what  was  represented  in  the  photo¬ 
graph  (potentially  some  distance  away),  without  any  additional  indication  of 
what  the  position  represented.  Another  aspect  identified  that  influenced  the 
positional  accuracy  was  the  confusion  between  similar  features  that  are  pre¬ 
sent  in  the  region  (such  as  different  bridges  over  a  river  close  to  each  other), 
which  became  apparent  when  the  location  of  the  photographs  was  viewed  on 
a  satellite  image  or  digital  map. 

The  assessment  of  the  positional  accuracy  or  the  extent  mapping  of  patchy 
vegetation,  highly- textured  land  use  types  and  ecotones  presents  much  more  of 
a  challenge.  For  land  cover  mapping,  it  is  often  the  case  that  categorical  labels 
(or  degrees  of  similarity  to  those  labels)  are  being  elicited  from  contributors 
for  attachment  to  user-supplied  location  points  or  to  predefined  polygon  fea¬ 
tures.  Absolute  positional  accuracy  is  still  important,  but  more  often  relates  to 
boundaries  between  mapped  areas  or  to  the  location  of  single  survey  points, 
and  the  predominant  source  of  inaccuracy  is  thematic  misclassification  (to 
which,  of  course,  these  positional  inaccuracies  can  contribute). 
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Other  approaches  may,  however,  be  considered  for  assessing  or  increas¬ 
ing  positional  accuracy  of  VGI,  due  to  the  amount  of  data  available  and  their 
dynamic  characteristics  (Section  2.2).  To  correct  and  quantify  positional  errors, 
conflation  approaches  that  use  a  set  of  reference  features  are  common  for  dis¬ 
crete  data  that  fit  an  existing  taxonomy  (Coleman,  2010;  Girres  and  Touya, 
2010;  Haklay,  2010). 


2.1.2  Thematic  Accuracy 

Thematic  accuracy  refers  to  the  accuracy  of  classes  or  thematic  tags  associated 
with  specific  locations  or  objects  placed  in  geographic  space,  such  as  classes 
assigned  to  pixels  in  a  land  cover  map  or  tags  assigned  to  a  vector-encoded 
entity,  e.g.  a  highway,  river,  building  or  green  area.  The  assessment  of  thematic 
accuracy  in  VGI  may  be  performed  using  a  traditional  approach,  where  the 
information  is  compared  to  reference  data,  e.g.  satellite  imagery  or  authorita¬ 
tive  data,  by  experts.  For  instance,  Estima  and  Painho  (2013;  2015)  and  Jokar 
Arsanjani  et  al.  (2015b)  investigated  the  thematic  accuracy  of  the  classification 
of  OSM  features  using  the  Corine  Land  Cover  database  and  the  pan-European 
GMESUA  dataset  as  authoritative  reference  data,  respectively.  However,  the 
assessment  of  the  thematic  accuracy  of  VGI  raises  new  challenges,  due  to  the 
lack  of  strict  specifications,  the  characteristics  of  the  contributors  and  contri¬ 
butions,  and  the  type  of  thematic  information  at  stake.  Therefore,  additional 
quality  indicators  may  be  used,  which  are  further  explained  in  Section  2.2.  The 
assignment  of  thematic  information  in  VGI  has  many  similarities  to  the  exten¬ 
sive  tagging  and  relevance  assessment  of  documents  by  volunteers  or  paid  con¬ 
tractors  working  via  systems  such  as  Amazons  Mechanical  Turk.  Many  land 
cover  mapping  challenges  are  effectively  labelling  problems,  where  predefined 
pixels  or  spatial  features  must  be  assigned  to  particular  classes;  therefore,  some 
of  the  work  developed  in  these  areas  of  application  to  assure  data  quality  may 
be  applied  to  VGI. 

Currently,  the  majority  of  VGI  is  contributed  for  free,  by  volunteers,  but 
there  is  an  increasing  interest  in  contracting  out  classification  tasks  such  as 
land  cover  labelling  to  paid  workers  in  the  cloud.  In  such  contexts,  spam  and 
errors  are  common,  whether  these  stem  from  a  lack  of  skill  or  from  deliber¬ 
ate  attempts  to  mislead  (including  attempts  to  cheat  the  system  in  a  way  that 
cannot  be  easily  detected).  A  number  of  strategies  have  been  proposed  and 
evaluated  for  getting  the  best  value  out  of  contracted  labellers,  and  in  particular 
for  trading  off  the  value  of  new  information  about  unlabelled  entities  against 
the  value  of  reinforcing  or  correcting  information  about  entities  that  have 
been  labelled  repeatedly  (Ipeirotis  et  al.,  2014).  This  corresponds  to  the  use  of 
additional  quality  indicators,  which  are  further  addressed  in  Section  2.2.  One 
consideration  when  deciding  between  accuracy  improvement  and  new  data 
acquisition  must  be  the  possible  impact  of  errors  when  a  dataset  is  used  in  the 
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real  world  -  a  balancing  act  similar  to  the  calculation  of  ROC  (Receiver  Oper¬ 
ating  Characteristic)  curves  or  sensitivity/specificity  calculations  for  classifiers 
and  prediction  algorithms.  The  problem  of  risk  and  liability,  when  considered 
in  the  VGI  world,  is  usually  sidestepped  through  the  use  of  disclaimers,  but 
if  VGI  begins  to  seriously  underpin  Spatial  Data  Infrastructures  (SDIs)  -  see 
Chapter  12  (Demetriou  et  al,  2017)  -  and  commercial  products,  the  issue  will 
become  more  pressing. 

Many  of  the  non-VGI  labelling  tasks  described  have  marked  parallels  to  VGI 
problems:  for  example,  data  points  are  often  being  collected,  like  ‘ground  truth, 
in  order  to  carry  out  a  supervised  classification,  and  in  many  cases  the  labelling 
is  not  simply  binary  or  categorical.  In  such  cases,  when  redundant  observa¬ 
tions  exist  for  each  particular  item,  the  variation  between  labellers  is  not  sim¬ 
ply  noise;  often,  the  uncertainty  and  disagreement,  if  recorded  and  analysed, 
can  yield  important  information  about  the  real  world.  In  the  case  of  VGI,  this 
could  include  conditions  on  the  ground  such  as  vegetation  succession,  change 
of  ownership  or  mixing  of  land  covers.  Many  papers  in  the  field  also  note  the 
importance  of  training  for  labellers  as  well  as  for  models  (e.g.  Clark  and  Aide, 
2011;  Fritz  et  al.,  2012),  and  show  the  sorts  of  learning  curves  that  are  possible 
with  varying  quantities  and  qualities  of  reference  data. 

Of  course,  even  well  trained  users  vary  in  their  accuracy,  and  differences 
between  experts  and  non-experts  are  also  likely  to  exist.  A  comparison  of  the 
quality  results  of  expert  and  non-expert  volunteers  for  tag  assignment  was 
done  by  See  et  al.  (2013).  The  results  showed  that  in  some  types  of  tags  (in  this 
particular  case,  ‘human  impact’),  non-expert  volunteers  produced  results  as 
good  as  the  experts,  probably  because  the  concept  was  new  to  both  non-experts 
and  experts  alike  so  both  had  the  same  learning  curves.  However,  for  some 
land  cover  classes,  the  experts  (some  of  whom  had  considerable  experience  in 
image  classification)  performed  better,  but  the  non-experts  showed  improve¬ 
ments  over  time,  especially  when  feedback  on  the  quality  of  their  results  was 
provided  to  them. 


2.1.3  Completeness 

Completeness  refers  to  the  presence  or  absence  of  features,  of  their  attributes 
and  of  relationships  compared  to  the  products  specification;  it  is  divided  into  a) 
commission,  which  explains  excess  data  presence  in  a  dataset,  and  b)  omission, 
which  explains  data  absence  from  a  dataset.  Completeness  is  of  major  concern/ 
importance  in  VGI,  since  many  volunteered  datasets  are  demonstrably  biased 
towards  particular  spatial  regions  (see  e.g.  Haklay,  2010),  but  also  towards  cer¬ 
tain  features  that  are  easier  to  measure  or  towards  themes  or  ‘pet  features’  (Begin 
et  al.,  2013)  that  are  of  particular  interest  to  the  contributing  individual,  or  even 
motivated  by  accessibility  or  digital  inclusion  (Zielstra  and  Zipf,  2010).  This  reli¬ 
ance  on  the  motivation  of  individual  volunteers  will  determine  the  resolution, 
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homogeneity,  representativity  and  domain  consistency  of  the  resulting  data. 
Where  a  principled  sampling  strategy  can  be  imposed  on  volunteers,  e.g.  a  prob¬ 
abilistic  schema  or  the  systematic,  even  grid  of  the  Degree  Confluence  Project, 
the  volunteered  data  have  the  potential  to  be  more  broadly  applicable,  but  the 
value  of  the  data  will  depend  on  the  coverage  by  volunteers,  meaning  that  many 
platforms  must  actively  direct  users  to  the  desired  locations,  trading  off  poten¬ 
tially  rich  information  elsewhere  against  an  even  placement  of  observations. 

The  lack  of  specifications  and  the  nature  of  VGI  makes,  in  some  cases,  the 
assessment  of  completeness  a  complex  process,  which  cannot  rely  only  on 
direct  unit-based  comparisons,  and  instead  requires  the  development  of  new 
approaches.  Moreover,  in  many  areas,  the  number  of  digitised  VGI  features 
may  exceed  that  found  in  an  authoritative  dataset  (Neis  et  al.,  2011),  making 
a  simple  comparison  of  feature  counts  inappropriate,  and  requiring  a  subtler 
consideration  of  commission  and  omission  (Jackson  et  al.,  2013).  Koukoletsos 
et  al.  (2012)  present  a  method  that  holds  promise  for  such  contexts,  combining 
geometric  and  attribute  constraints  to  match  road  segments  in  OSM  with  those 
found  in  an  authoritative  dataset,  and  to  achieve  a  tile-by-tile  completeness 
assessment.  In  another  study,  Hecht  et  al.  (2013)  proposed  an  object-based 
approach  to  assess  the  completeness  of  building  footprints.  Haklay  (2010) 
identified  a  bias  in  UK  OSM  data  coverage  towards  more  affluent  areas,  and 
relates  this  to  the  fact  that  socially  marginal  (and  less-mapped)  areas  may  be 
the  very  locations  where  charities  and  agencies  requiring  free  data  are  operat¬ 
ing.  Brovelli  et  al.  (2017)  developed  a  web  application  to  compare  OSM  road 
data  with  authoritative  road  data,  enabling  the  assessment  of  completeness  and 
positional  accuracy  of  OSM  data.  Ciepluch  et  al.  (2010)  also  compared  the 
spatial  coverage  of  OSM  to  that  of  Google  Maps  and  Bing  Maps,  and  identified 
regions  with  different  levels  of  coverage  in  the  three  datasets.  Globally,  this 
bias  is  being  somewhat  redressed  by  the  volunteers’  own  efforts  to  improve 
coverage,  and  by  focused  initiatives  such  as  KompetisiOSM  in  Indonesia2 ,  but 
it  remains  the  case  that  coverage  is  extremely  heterogeneous  in  VGI,  both  spa¬ 
tially  and  thematically,  and  that  the  absence  of  information  in  an  area  makes 
it  difficult  to  draw  robust  conclusions  about  trends.  Brunsdon  and  Comber 
(2012)  specifically  addressed  the  lack  of  experimental  design  in  a  volunteered 
dataset  recording  the  first  flowering  date  of  lilacs  in  the  USA  by  applying  ran¬ 
dom  coefficient  modelling  and  bootstrapping  approaches  to  tease  out  more 
reliable  information  on  phenological  trends. 


2.1.4  Temporal  Quality 

Temporal  quality  refers  to  the  quality  of  the  temporal  attributes,  such  as  date 
of  collection,  date  of  publication,  update  frequency,  last  update  or  temporal 
validity  (also  referred  to  as  currency),  and  also  to  relationships  between  the 
temporal  validity  of  features.  Currency  is  one  aspect  of  traditional  data  quality 
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where  VGI  can  be  expected  to  surpass  authoritative  data,  especially  in  dynami¬ 
cally  changing  environments,  given  the  large  numbers  of  citizens  who  are  act¬ 
ing  as  sensors  at  any  one  time.  However,  there  is  often  a  trade-offbetween  cur¬ 
rency  and  other  facets  of  data  quality.  The  issue  of  representativeness  becomes 
even  more  vexed  when  the  spatial  domain  is  extended  to  the  spatio-temporal 
domain,  and,  unless  a  temporal  sampling  scheme  is  also  imposed  upon  con¬ 
tributors,  the  density  and  coverage  of  a  VGI  dataset  over  a  small  time  range  can 
be  very  limited.  For  citizen  sensor  networks,  which  are  largely  made  up  of  auto¬ 
mated  instruments,  such  as  the  Weather  Underground,  the  observation  pattern 
across  time  is  fairly  consistent.  However,  in  other  contexts  (e.g.  presence-only 
species  observations  and  the  mapping  of  urban  infrastructure),  a  user  will  need 
to  carefully  consider  the  ranges  of  data  that  are  appropriate  for  their  purpose, 
and  whether  cumulative  observations  are  valuable.  In  making  this  decision, 
they  will  probably  require  metadata  on  the  individual  features,  e.g.  date  stamps 
and  data  on  feature  updates.  An  important  consideration  here  is  that  the  date 
stamp  should  reflect  the  time  at  which  the  measurement  or  observation  was 
made,  rather  than  the  time  at  which  it  was  uploaded  or  digitised,  depending  on 
the  application  to  which  the  data  are  applied  (see  e.g.  Antoniou  et  al.,  2016a). 

Even  though  the  potential  of  VGI  to  provide  updated  information  is  large, 
it  is  relevant  to  notice  that  a  large  heterogeneity  is  likely  to  occur  over  space 
and  for  different  types  of  phenomena  or  features  to  be  mapped,  since  VGI  is 
dependent  on  the  availability  of  interested  volunteers  to  collect  each  particular 
type  of  data  at  the  required  locations. 


2.1.5  Logical  Consistency 

Logical  consistency  refers  to  the  degree  of  adherence  to  logical  rules  of  data 
structure,  attribution  and  relationships  as  described  in  a  product’s  specifica¬ 
tions.  Logical  consistency  of  an  observation  makes  little  sense  in  isolation:  it 
must  usually  be  assessed  with  reference  to  other  data  from  the  same  source,  or 
from  independent  (and  sometimes  authoritative)  data,  and  lends  itself  to  auto¬ 
mated  quality  assessment  -  for  example,  to  the  use  of  rules  such  as  ‘forest  fires 
are  highly  unlikely  in  dense  urban  areas’.  Hashemi  and  Ali  Abbaspour  (2015) 
used  the  concept  of  spatial  similarity  in  a  multi-representation  data  combina¬ 
tion  to  build  a  framework  to  determine  the  probable  inconsistencies  in  OSM, 
aiming  to  help  in  evaluating  the  logical  consistency  of  VGI  data.  Bonter  and 
Cooper  (2012)  discuss  the  use  of  a  smart  filter  system  in  the  context  of  species 
identification  in  Project  FeederWatch:  when  participants  enter  counts  of  spe¬ 
cies  that  are  too  high  or  species  that  do  not  normally  appear  on  standard  lists, 
the  filter  is  activated  and  users  are  informed  of  unusual  observations,  thereby 
correcting  potential  errors  in  real-time.  Similar  smart  filters  could  be  devised 
and  put  into  place  in  other  types  of  VGI  projects,  thereby  addressing  some 
aspects  of  logical  consistency. 
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2.1.6  Usability 

As  mentioned  above,  usability  (or  fitness-for-use)  refers  to  the  external  quality 
of  a  dataset  and  is  focused  on  the  needs  of  the  user.  The  five  aforementioned 
data  quality  elements  may  be  aggregated  in  order  to  describe  the  overall  usabil¬ 
ity  of  a  specific  dataset  for  a  particular  use,  i.e.,  fitness-for-purpose.  In  other 
words,  usability  acts  as  a  complementary  element  by  linking  both  user  require¬ 
ments  and  data  quality  measures  to  check  whether  the  data  for  a  specific  appli¬ 
cation  can  be  used  (Guptill  and  Morrison,  1995;  Devillers  et  al.,  2007). 

Table  1  summarises  the  requirements  and  specific  aspects  regarding  the 
application  of  ISO  quality  measures  to  VGI.  In  Section  3,  establishing  work- 
flows  and  combining  quality  indices  to  assess  VGI  quality  in  order  to  assess 
usability  is  further  developed. 


2.2  Quality  Measures  Specific  to  VGI 

When  considering  VGI,  other  data  quality  indicators  are  required  to  supple¬ 
ment  those  proposed  in  the  ISO  framework.  This  occurs  not  only  because  in 
many  situations  comparison  with  authoritative  datasets  is  not  possible,  but 
also  because  the  characteristics  and  nature  of  VGI  enable  the  use  of  indicators 
that  do  not  usually  make  sense  when  applied  to  data  created  by  professionals. 
These  indicators  may  provide  valuable  information  even  though  in  most  situa¬ 
tions  they  do  not  assess  accuracy  but  instead  assess  data  reliability  or  credibility 
(which  are  considered  as  synonyms  in  this  chapter).  As  these  indicators  may 


Table  1:  ISO  quality  elements,  their  requirements  and  issues  related  to  their 
use  with  VGI. 


ISO  quality  elements 

Requirements 

Issues  for  the  application 
to  VGI 

Internal 

quality 

Positional  accuracy 
Thematic  accuracy 
Completeness 
Temporal  Quality 

•  Data  specification 

•  Existence  of 
reference  data 
with  similar 
characteristics  and 
valid  time  frame 

•  Lack  of  specifications 

•  Dynamic  nature  of  VGI 

•  Inexistence  of  comparable 
reference  data 

•  Spatial  and  thematic 
heterogeneity 

Logical 

Consistency 

•  Other  data  of  the 

same  source  or 
independent  data 

•  Applicable  to  VGI 

•  May  enable  automatic 
validation  checks 

External 

quality 

Usability 

•  Specification  of  user 
needs 

•  May  be  assessed  by 
combining  quality 
measures  and  indicators 
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provide  data  that  allow  quality  estimation  in  real-time  or  near  real-time,  they 
enable  the  development  of  automated  approaches  that  may  be  used  to  improve 
the  process  of  data  collection,  requiring,  for  example,  confirmation  and/or 
additional  checks  by  the  contributors. 

Different  suggestions  have  been  put  forth  regarding  what  these  indica¬ 
tors  might  look  like  (Table  2).  For  example,  Goodchild  and  Li  (2012)  provide 
three  broad  categories  of  measures  to  ensure  VGI  data  quality:  i)  crowdsourc¬ 
ing  revision,  where  data  quality  can  be  ensured  by  multiple  contributors;  ii) 
social  measures,  which  focus  on  the  assessment  of  contributors  themselves 
as  a  proxy  measure  for  the  quality  of  their  contributions;  and  iii)  geographic 
consistency,  through  an  analysis  of  the  consistency  of  contributed  entities. 
Meek  et  al.  (2014)  provide  three  models  of  data  quality,  where  the  stakeholder 
model  sits  in  between  the  more  traditional  internal  (producer)  and  external 
(consumer)  quality  indicators,  and  they  suggest  a  number  of  different  quality 
elements,  including  vagueness,  ambiguity,  judgement,  reliability,  validity  and 
trust.  Bordogna  et  al.  (2014)  also  provide  a  set  of  quality  indicators  for  VGI 
that  are  arranged  into  internal  and  external  quality,  where  the  internal  quality 
measures  are  grouped  by  type  of  VGI,  i.e.  measurements  or  text-based  VGI, 
and  the  external  quality  measures  are  grouped  by  reliability  of  the  individual 
and  reputation  of  the  organisation.  Senaratne  et  al.  (2016)  review  VGI  quality 
assessment  methods  and  separate  them  into  measures  and  indicators  of  quality, 
where  the  former  correspond  to  the  traditional  accuracy  assessment  measures 
described  in  the  previous  section,  and  the  latter  are  referred  to  as  qualitative 
and  more  abstract  quality  indicators,  such  as  local  knowledge,  experience  and 
reputation.  They  also  suggest  that  an  additional  approach  to  ensure  data  quality, 
referred  to  as  ‘data  mining’,  should  be  added  to  the  ones  proposed  by  Goodchild 
and  Li  (2012).  Antoniou  and  Skopeliti  (2015)  propose  the  aggregation  of  the 
quality  indicators  into  three  broad  categories:  i)  data  indicators;  ii)  demographic 
and  other  socio-economic  indicators;  and  iii)  indicators  about  the  contributors. 
These  may  be  considered  to  integrate  the  types  of  indicators  mentioned  in  the 
above  different  frameworks  and  are  developed  further  in  this  chapter. 

Table  2:  Categories  of  quality  measures  proposed  for  VGI. 


Goodchild  and 
Li  (2012) 

Meek  et  al. 
(2014) 

Bordogna 
et  al.  (2014) 

Antoniou  and 
Skopeliti  (2015) 

Senaratne  et 
al.  (2016) 

•  Crowdsourcing 
revision 

•  Social 

measures 

•  Geographic 
consistency 

•  Internal 
quality 
indicators 

•  Stakeholder 
model 

•  External 
quality 
indicators 

•  Internal 
quality 

•  External 
quality 

•  Data  indicators 

•  Demographic 
and  socio¬ 
economic 
indicators 

•  Contributor 
indicators 

•  Measures  of 
quality 

•  Indicators  of 
quality 

•  Data  mining 
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2.2.1  Data-based  Indicators 

One  important  group  of  quality  indicators  of  VGI  are  those  that  involve  com¬ 
parison  with  other  sources  of  crowdsourced  data  (Table  3).  One  possibility  is 
to  measure  the  ‘agreement’  to  the  corresponding  data,  which  we  define  here  as 
the  coherence  of  the  data  with  other  sources  of  crowdsourced  data.  Agreement 
can  be  measured  between  datasets  using  a  Boolean  measure  or  a  continuous 
variable  with  traditional  measures  such  as  distance  between  corresponding  ele¬ 
ments,  attribute  comparisons,  etc.,  and  may  be  considered  an  indicator  of  data 
reliability.  Logical  consistency  of  data  available  in  different  data  sources  can 
also  be  used  to  estimate  data  reliability,  identifying  if,  according  to  the  types  of 
features  present  in  all  available  data  sources,  a  particular  contribution  is  likely 
to  be  correct  or  not.  As  stressed  by  Sui  et  al.  (2013),  approaches  that  compare 
data  based  on  their  geographic  location  have  not  yet  been  developed  enough. 
Note,  however,  that  all  these  indicators  may  be  used  to  measure  data  reliability, 
but  not  to  assess  data  accuracy  if  none  of  the  data  under  comparison  can  be 
considered  as  reference  data. 

Another  set  of  indicators  can  also  be  calculated  that  could  reveal  VGI  qual¬ 
ity  by  solely  examining  the  VGI  dataset  itself  and  the  associated  metadata 
(Table  3).  The  work  in  this  area  has  focused  primarily  on  assessing  OSM  data 
quality.  Such  indicators  could  include  the  total  length  of  features  and  the  point 
density  in  a  square-based  grid,  as  calculated  by  Ciepluch  et  al.  (2010),  or  the 
number  of  versions,  the  stability  against  changes  and  the  corrections  and  roll¬ 
backs  of  features,  as  examined  by  Kefiler  and  de  Groot  (2013).  The  provenance 
of  features  contributed  to  OSM  (i.e.  whether  the  data  were  captured  using  a 
GPS,  were  manually  digitised  or  resulted  from  a  bulk  import)  has  been  the 


Table  3:  Data-based  quality  indicators  proposed  for  VGI. 


Indicators 

Category 

Indicators 

Description  /  Examples 

Data-based 
indicators 
(assess  data 
reliability) 

Coherence  with  other  sources 
of  corresponding  data 
(not  considered  as  reference) 

Compare,  for  example,  geometric 
attributes  such  as  distance  between 
corresponding  elements  or  overlaps 

External  logical  consistency 

Logical  consistency  of  VGI  with  non¬ 
corresponding  data  available  in  other 
data  sources 

Internal  logical  consistency 

Logical  consistency  of  the  VGI 
dataset  itself 

VGI  metadata 

Number  of  versions,  features 
corrections,  stability  against 
changes,  observation  methods,  used 
equipment,  date  of  observation 
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focus  of  the  quality- related  work  of  Van  Exel  et  al.  (2010).  Finally,  Barron  et  al. 
(2014)  have  developed  iOSMAnalyzer,  which  uses  more  than  25  methods  and 
indicators  to  assess  OSM  data  quality  based  solely  on  data  history.  Although 
some  of  these  indicators  are  related  to  the  aforementioned  quality  component 
of  completeness  (Section  2.1.3),  completeness  in  authoritative  GI  would  not  be 
measured  in  this  way.  Hence  there  is  a  need  to  find  completeness  and  other  data 
indicators  that  are  customised  to  the  nature  of  VGI. 

Some  of  the  facets  of  traditional  metadata  are  of  particular  interest  in  assess¬ 
ing  and  using  VGI.  For  example,  the  lineage  of  a  record  or  dataset  may  include 
its  edit  history  and  information  on  how  it  was  measured,  and  can  be  especially 
important  in  the  automated  assessment  of  VGI  fitness-for-use.  Examples  of 
metadata  potentially  useful  for  VGI  are  equipment  used  in  measurements;  data 
about  the  volunteer  (contributor  indicator);  date  and  time  of  data  collection;  or 
atmospheric  conditions  at  the  time  a  particular  observation  was  taken.  Indi¬ 
vidual  metadata  about  heterogeneous  observations  can  be  extremely  useful  in 
identifying  bias  and  likely  trustworthiness,  as  seen,  for  example,  in  the  context 
of  amateur  weather  monitoring  (Bell  et  al,  2013)  and  digitised  trails  (Esmaili 
et  al.,  2013).  However,  metadata  are  often  not  available  for  VGI,  which  limits, 
to  some  extent,  the  use  of  these  approaches.  To  overcome  this  difficulty,  meth¬ 
odologies  have  already  been  proposed  to  create  metadata  for  VGI  (Kalantari 
et  al.,  2014). 


2.2.2  Demographic  and  Socio-economic  Indicators 

Empirical  studies  have  revealed  that  there  is  a  correlation  between  the  demo¬ 
graphics  of  an  area  and  the  completeness  and  positional  accuracy  of  the  data 
(Mullen  et  al.,  2015).  It  has  also  been  shown  that  areas  with  lower  population 
density  (i.e.  rural  areas)  can  have  a  negative  effect  on  the  completeness  of  VGI 
data  (Zielstra  and  Zipf,  2010).  At  the  same  time,  population  density  correlates 
positively  with  the  number  of  contributions,  thus  affecting  data  completeness 


Table  4:  Demographic  and  Socio-economic  quality  indicators  proposed 
for  VGI. 


Indicators  Category 

Indicators 

Relevance 

Demographic  and  Socio-economic 
indicators  of  the  region 
(indicators  of  data  quality) 

Demographics 

Show  correlation 
with  data  quality 
parameters 

Population  density 

Social  deprivation 

Socio-economic  reality 

Income 

Population  age 
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or  positional  accuracy  (see  e.g.  Zielstra  and  Zipf,  2010;  Haklay,  2010;  Haklay 
et  al.,  2010;  Jokar  Arsanjani  and  Bakillah,  2015) . 

Closely  related  to  demographics  are  other  socio-economic  factors,  which 
may  also  influence  the  overall  quality  (Tulloch,  2008;  Elwood  et  al.,  2013).  For 
example,  it  has  been  shown  that  social  deprivation  and  the  underlying  socio¬ 
economic  reality  of  an  area  can  have  a  considerable  effect  on  completeness  and 
positional  accuracy  of  OSM  data  (Haklay  et  al.,  2010;  Antoniou,  2011).  Simi¬ 
larly,  other  factors  such  as  high  income  and  low  population  age  can  result  in  a 
higher  number  of  contributions  and  therefore  higher  VGI  quality  in  terms  of 
positional  accuracy  and  completeness  (Girres  and  Touya,  2010;  Jokar  Arsanjani 
and  Bakillah,  2015). 

Thus,  if  census  or  social  survey  data  are  available  for  an  area,  they  might  be 
used  to  make  inferences  about  the  quality  of  VGI  data  over  geographic  space. 
Table  4  summarises  the  above  mentioned  indicators. 


2.2.3  Contributor  Indicators 

Quality  indicators  can  include  the  history  of  contributions,  the  profiling  of 
contributors  or  the  experience,  recognition  and  local  knowledge  of  the  indi¬ 
vidual  (van  Exel  et  al.,  2010;  Table  5).  Moreover,  the  number  of  contributors  in 
certain  areas  or  features  has  been  examined,  and  has  been  positively  correlated 
with  data  completeness  and  positional  accuracy  (Kefiler  and  de  Groot,  2013). 
Methods  for  the  automatic  computation  of  contributor  reliability  regarding 


Table  5:  Contributor  quality  indicators  proposed  for  VGI. 


Indicators 

Category 

Indicators 

Description 

Relevance 

Contributor 

indicators 

(assess 

contributor 

reliability) 

Contributors’  interests 

Infer  contributor  bias  to 
particular  features 

Expected 
correlation 
with  data 
reliability 

Contributors’  history  of 
contributions 

Infer  contributor 
trustworthiness 

Contributors’  recognition 
by  other  contributors 

Infer  contributor  reliability 

Contributors’  location 

Infer  contributor  local 
knowledge 

Contributors’  behaviour 

Infer  contributor  difficulty 
in  contributing 
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thematic  information  in  VGI  have  been  proposed  by  several  authors.  Haklay 
et  al.  (2010)  and  Tang  and  Lease  (2011)  stress  the  need  for  multiple  observa¬ 
tions  and  observers  to  enable  consensus-based  data  quality  assessments.  Foody 
and  Boyd  (2012)  and  Foody  et  al.  (2013)  proposed  a  method  for  using  these 
repeated  observations  to  concretely  assess  the  quality  of  VGI  contributors 
using  a  latent  class  analysis  of  VGI  in  relation  to  land  cover. 

Differences  between  volunteers  are  always  likely  to  exist,  and,  therefore,  in 
the  examples  of  ‘social’  quality  assessment  described  above,  known  individuals 
could  be  identified  and  given  a  more  trusted  status,  and  these  individuals  could 
then  be  actively  responsible  for  reviewing  the  work  of  others.  However,  when 
considering  thematic  quality,  the  issue  of  contributor  reliability  can  be  more 
complicated  than  a  single  ranking.  Some  contributors  excel  at  labelling  particu¬ 
lar  types  of  objects  or  habitats,  but  perform  poorly  elsewhere  in  the  problem 
domain.  Knowledge  of  the  strengths  and  weaknesses  of  the  volunteers  allows 
a  more  nuanced  consideration  of  the  trustworthiness  of  their  contributions, 
but  often  requires  independent  reference  data  to  be  computed.  For  example, 
Comber  et  al.  (2013)  calculated  the  consistency  and  skill  of  each  volunteer  in 
relation  to  each  land  cover  class,  using  a  number  of  control  points  for  which  the 
land  cover  had  been  independently  determined  by  experts,  and  demonstrated 
that  at  least  some  concerns  about  the  quality  of  VGI  can  be  addressed  through 
careful  data  collection,  the  use  of  control  points  to  evaluate  volunteer  perfor¬ 
mance  and  spatially  explicit  analyses. 

In  the  context  of  labelling  for  commercial  gain,  the  workers  do  not  see  the 
submissions  of  others,  and  it  is  necessary  to  automate  the  process  of  iden¬ 
tifying  trustworthy  experts  against  whom  the  work  of  others  can  be  bench- 
marked  (Raykar  and  Yu,  2012).  Vuurens  and  de  Vries  (2012)  tackle  this  issue 
by  deriving  patterns  from  the  behaviour  of  different  worker  types,  and  attempt 
to  diagnose  the  nature,  and  thus  the  likely  error  rate,  of  particular  workers.  For 
example,  they  note  that  ‘diligent’  workers  are  less  likely  to  differ  in  their  votes 
by  more  than  one  step  on  an  ordinal  scale  of  labels,  and  they  exploit  this  fact 
to  interpret  the  difference  between  contributors’  judgements  to  identify  their 
trustworthiness.  However,  there  are  many  contexts  where  no  natural  ordering 
is  present  in  the  labels  from  which  a  contributor  can  choose. 

Some  of  the  facets  of  metadata  regarding  the  volunteer,  such  as  age,  address, 
level  of  education  or  interests,  are  of  interest  in  assessing  VGI  reliability.  It  is 
also  possible  to  construct  metadata  based  on  the  past  behaviour  of  a  user  or  the 
number  of  times  their  contributions  have  been  identified  as  erroneous  by  other 
volunteers,  which  requires  the  storing  of  all  alterations  and  changes  made  to 
the  system.  This  may  enable,  through  the  definition  of  a  set  of  rules,  the  auto¬ 
matic  extraction  of  quality  information,  which  may  be  used  as  an  initial  indica¬ 
tor  of  credibility,  enabling  the  exclusion  of  some  VGI  from  an  analysis  based  on 
the  likelihood  that  it  might  be  less  trustworthy.  An  example  of  these  procedures 
is  the  approach  proposed  by  Lenders  et  al.  (2008),  where  the  contributor’s  reli¬ 
ability  is  assessed  using  the  information  about  the  volunteer’s  location  and  the 
time  of  the  contribution.  These  types  of  approaches  may  be  particularly  useful 
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for  NMAs  (see  Chapter  13  by  Olteanu-Raimond  et  al.,  2017),  for  example,  to 
identify  which  contributions  are  more  reliable  and  therefore  worthy  of  alloca¬ 
tions  of  resources  for  their  validation,  as  all  crowdsourced  data  used  by  NMAs 
need  to  be  validated  by  professionals  (Fonte  et  al.,  2015a). 

It  is  also  possible  to  measure  the  ‘vagueness’  of  contributions,  defined  by 
Meek  et  al.  (2014)  as  the  inability  of  a  contributor  to  make  a  clear-cut  decision. 
For  example,  when  volunteers  are  asked  to  interpret  satellite  imagery  in  Geo- 
Wiki,  they  attach  a  confidence  rating  to  their  choice,  which  ranges  from  highly 
uncertain  to  full  confidence  in  their  answer  (Fritz  et  al.,  2012).  These  vagueness 
measures  can  be  used  as  filters  on  the  data  or  to  apply  weights  to  those  answers 
with  higher  vagueness. 

3  Developing  Quality  Assurance  Workflows  and  Combining 

Indicators 

Although  many  different  quality  indicators  and  measures  for  VGI  have  been 
emerging  over  the  last  decade,  combining  these  indicators  into  an  integrated 
quality  assessment  is  an  ongoing  area  of  VGI  data  quality  research.  For  exam¬ 
ple,  Bishr  and  Mantelas  (2008)  have  proposed  a  ‘trust  and  reputation  model’, 
where  these  two  concepts  together  are  proxies  for  data  quality  (Figure  1).  Users 
rate  each  other’s  contributions  on  a  score  range  of  1  to  10,  which  makes  up  the 
reputation  component.  Users  are  also  linked  to  one  another  through  a  social 
network,  which  can  be  used  to  measure  the  strength  of  the  relationship  between 
two  individuals.  These  two  components  are  combined  and  then  divided  by  the 
logarithm  of  the  distance  between  a  contributor’s  location  and  the  observation 
to  calculate  a  trust  rating.  This  trust  model  therefore  takes  both  spatial  context 
and  reputation,  through  user  ratings  and  the  relationships  between  contribu¬ 
tors,  into  account.  The  model  remains  theoretical  and  was  not  applied  in  the 
paper  cited  above,  but  an  example  of  data  collection  for  an  urban  growth  sce¬ 
nario  was  outlined.  The  inclusion  of  relationships  via  social  networking  could 
give  greater  weight  to  the  ratings  of  certain  individuals. 

Jokar  Arsanjani  et  al.  (2015a)  have  for  their  part  proposed  a  multivariate 
indicator,  referred  to  as  the  contribution  index  (Cl),  that  combines  diverse 
classic  quality  indicators,  as  well  as  user  perspectives  of  data,  including  the 
number  of  volunteers  involved  in  mapping  a  particular  feature  along  with  the 
frequency  of  contributions  (Figure  2). 

However,  the  main  problem  with  the  assessment  of  VGI  based  on  fitness- 
for-use  is  that  many  methods  and  measures  are  designed  to  assess  a  specific 
VGI  dataset  or  a  single  use  case,  and  are  not  generalisable  or  transferable 
to  other  VGI  datasets  or  purposes.  However,  some  papers  have  appeared  in 
which  quality  assurance  workflows  have  been  proposed.  For  example,  Bor- 
dogna  et  al.  (2015)  propose  a  flexible  system  that  allows  users  to  specify 
minimum  acceptable  quality  levels  based  on  their  requirements  (Figure  3). 
The  system  contains  a  series  of  quality  indicators,  including  both  standard 
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internal  quality  measures  such  as  positional  accuracy  and  ones  specifically 
geared  towards  VGI  (see  Section  2.2).  The  user  can  rank  the  importance  of 
the  different  indicators  and  specify  a  minimum  acceptable  level  of  quality  for 
each  indicator,  and  then  the  system  acts  as  a  filter  to  return  only  those  items 
from  the  VGI  database  that  meet  all  of  these  minimum  levels;  the  authors 
perform  a  demonstration  of  the  system  on  a  VGI  dataset  of  glaciological 
observations. 

The  creation  of  workflows  that  allow  for  the  assessment  of  different  aspects  of 
quality  has  also  been  proposed.  The  framework  proposed  by  COBWEB  includes 
a  quality  assessment  workflow  that  uses  some  automatic  validation  procedures 
to  obtain  data  quality  indicators  to  insert  in  the  information  metadata  (Meek 
et  al.,  2016),  while  Ballatore  and  Zipf  (2015)  have  proposed  a  multidimensional 
framework  to  assess  conceptual  quality. 

The  need  to  assess  fitness-for-use  has  been  present  even  without  considering 
VGI,  and  methodologies  to  make  this  assessment  have  already  been  proposed 
in  other  contexts.  For  example,  Lush  (2015)  proposed  the  creation  of  a  GEO 
label  that  aims  to  be  a  mechanism  to  assist  users  to  determine  the  fitness-for- 
use  of  datasets:  a  visual  tool  was  developed  that  aggregates  information  about 
the  producer,  data  lineage,  compliance  with  standards,  existence  of  quality 
information,  user’s  feedback,  expert  reviews  and  citation  information.  These 
types  of  tools  may  be  adapted  to  the  characteristics  of  VGI  and  generate  user 
friendly  tools  that  can  assist  the  user  in  identifying  which  data  are  appropriate 
for  each  application,  according  to  their  needs. 

This  is  an  area  of  research  that  we  anticipate  will  continue  to  grow  in  the 
future. 


4  Conclusions 

This  chapter  considered  the  quality  of  VGI  from  the  perspective  of  ISO  19157 
and  then  presented  additional  quality  measures  designed  to  handle  the  specific 
nature  of  VGI,  e.g.  data-specific  indicators,  demographic  and  socio-economic 
indicators,  and  indicators  related  to  the  contributors.  Authoritative  data  and 
VGI  have  similarities,  i.e.  both  are  examples  of  spatial  data  that  can  be  assessed 
using  the  measures  set  out  in  ISO  19157.  However,  there  are  also  some  differ¬ 
ences  between  these  two  data  sources  that  require  new  ways  of  quality  assess¬ 
ment,  since  the  specific  nature  of  VGI  presents  some  problematic  issues  as  well 
as  new  challenges.  These  issues  and  challenges  include  the  heterogeneity  of  the 
data  and  contributors,  spatial  bias,  lack  of  specifications,  the  dynamic  nature 
in  which  the  data  are  updated,  the  patchiness  of  the  contributions  and  the  lack 
of  authoritative  data,  all  of  which  have  driven  the  development  of  new  assess¬ 
ment  methods  for  VGI.  For  example,  the  lack  of  reference  data  (as  well  as  the 
static  nature  of  reference  data)  has  led  to  studies  that  have  moved  away  from 
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the  need  to  use  authoritative  data  to  assess  the  quality  of  VGI;  this  has  resulted 
in  the  creation  of  new  data  indicators,  e.g.  consistency  related  to  multiple  con¬ 
tributions  at  the  same  place  or  agreement  of  multiple  contributions  of  the  same 
set  of  features.  At  the  same  time,  the  social  element  of  VGI  has  led  to  research 
into  socio-economic  and  demographic  indicators,  while  the  pivotal  role  of  the 
contributor  in  VGI  has  stimulated  research  around  a  diverse  set  of  indicators 
related  to  quantifying  them. 

Another  area  of  more  recent  VGI  quality-related  research  has  been  in 
combining  indicators,  either  as  a  way  to  visualise  the  quality  using  graphi¬ 
cal  approaches,  such  as  through  a  GEO  label  (Lush,  2015),  or  to  create  work- 
flows  that  allow  for  the  assessment  of  different  aspects  of  quality.  However,  few 
attempts  have  yet  been  implemented  that  use  automated  processes  to  assess 
VGI  quality  in  addition  to  the  use  of  the  crowd  self-correction  or  of  selected 
volunteers  for  data  validation  (Fonte  et  al.,  2015b).  Nevertheless,  these  com¬ 
binations  are  particularly  desirable  due  to  the  dynamic  characteristic  of  VGI, 
which  makes  the  use  of  traditional  approaches,  which  take  time  and  require 
expert  intervention,  less  suitable. 

Although  VGI  has  many  similarities  to  authoritative  GI,  one  of  the  main  dif¬ 
ference  is  the  much  more  relaxed  nature  of  the  data  collection  protocols.  The 
need  for  more  VGI  protocols,  including  the  need  for  a  framework  that  consid¬ 
ers  quality  as  one  element,  is  addressed  in  Chapter  10  (Minghini  et  al.,  2017). 
Chapter  10  also  considers  how  quality  assurance  can  be  influenced  by  tech¬ 
nological  solutions  that  can  help  to  seamlessly  enforce  protocols  and  thereby 
increase  data  quality,  while  recognising  the  trade-offs  between  the  complexity 
of  the  protocol  and  participant  motivation  and  retention. 

The  quality  of  VGI  will  continue  to  be  one  of  the  most  important  barriers  to 
the  integration  of  VGI  to  authoritative  data,  and  developing  generic  and  flex¬ 
ible  solutions  such  as  the  system  proposed  by  Bordogna  et  al.  (2015)  represents 
one  tangible  step  forward;  thus,  we  envisage  that  workflow  developments  will 
be  a  key  area  of  research  in  the  future.  Standards  agencies  also  need  to  recog¬ 
nise  that  there  are  new  sources  of  spatial  data  and  that  existing  standards  must 
be  adapted  to  include  these  sources  or  new  standards  must  be  developed.  A  first 
step  in  this  direction  has  been  made  by  the  W3C  with  a  document  (currently 
in  a  draff  form;  Tandy  et  al.,  2016)  on  best  practices  that  should  be  taken  into 
consideration  when  publishing  and  using  spatial  data  on  the  Web.  The  docu¬ 
ment  highlights  another  aspect,  and,  in  a  sense,  extends  the  notion  of  usability, 
by  drawing  attention  to  the  discoverability  and  accessibility  of  the  spatial  data 
published. 


Notes 


1  http://confluence.org/ 

2  https://www.hotosm.org/projects/indonesia-0 
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Abstract 

OpenStreetMap  (OSM)  is  the  most  successful  example  of  Volunteered 
Geographic  Information  (VGI).  It  is  also  the  most  frequently  used  case 
study  in  research  that  focuses  on  VGI  quality,  as  it  is  usually  considered 
a  proxy  for  other  VGI  projects.  The  research  in  this  area  usually  focuses 
on  comparisons  with  authoritative  data,  measurements  and  quality  statis¬ 
tics.  In  other  papers,  scholars  have  explored  quality  frameworks  or  studied 
the  motivation  and  engagement  of  volunteers.  This  chapter  examines  OSM 
quality  from  a  different  point  of  view.  The  focus  here  is  on  examining  how 
the  qualitative  elements  of  the  micro-environment  within  OSM,  such  as 
data  specifications  and  the  OSM  editors,  have  evolved  over  time.  We  dis¬ 
cuss  how  their  evolution  can  affect  OSM  data  quality,  taking  into  account  a 
number  of  different  factors  and  dimensions  that  directly  affect  the  quality 
of  the  contributions. 
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1  Introduction 

OpenS treetMap  (OSM)  is  one  of  the  first  examples  of  Volunteered  Geographic 
Information  (VGI;  Goodchild,  2007),  and  continues  to  be  one  of  its  prime 
examples.  VGI  has  been  defined  as  ‘the  widespread  engagement  of  large  num¬ 
bers  of  private  citizens,  often  with  little  in  the  way  of  formal  qualifications,  in 
the  creation  of  geographic  information  (Goodchild,  2007).  A  number  of  factors 
have  helped  this  phenomenon  to  grow,  including  the  removal  of  the  selective 
availability  of  the  Global  Positioning  System  (GPS)  in  2000  (Clinton,  2000), 
which  has  resulted  in  the  proliferation  of  GPS-enabled  devices,  novel  Web  2.0 
practices  and  programming  techniques  as  well  as  the  development  of  spatial 
applications  and  products  based  on  global-wide  maps  of  satellite  imagery  by 
technology  giants  such  as  Google,  Microsoft  and  Yahoo!.  Since  2007,  VGI  has 
become  intertwined  with  crowdsourcing,  active  local  communities  and  social 
media,  and  thus  can  be  found  in  many  flavours  and  extracted  from  various 
sources  (for  more  details,  see  Chapter  2  by  See  et  al.,  2017),  such  as  web  appli¬ 
cations  about  toponyms,  GPS  tracks,  sharing  of  geotagged  photographs,  syn¬ 
chronous  micro-blogging,  social  networking  sites,  etc.  A  very  interesting,  and 
equally  promising,  interconnection  of  VGI  is  the  one  with  the  domain  of  citizen 
science  (Haklay,  2013).  As  the  latter  gains  momentum,  the  need  for  geotagged 
measurements  and  information  is  growing,  and  along  with  it  the  quest  for  solid 
answers  about  the  caveats  and  challenges  that  VGI  projects  face,  especially  with 
respect  to  data  quality.  Thus,  understanding  how  the  most  successful  VGI  pro¬ 
ject  (i.e.  OSM)  has  evolved  in  terms  of  quality  will  give  insights  valuable  to 
other  existing  VGI  projects  or  projects  that  will  follow  in  the  future,  including 
those  in  the  citizen  science  domain.  Spatial  data  quality  is  the  cornerstone  of 
every  spatial  database,  map,  product  or  service.  Measuring,  understanding  and 
documenting  the  quality  of  spatial  data  is  of  paramount  importance  for  any 
kind  of  geodata,  including  VGI. 

This  chapter  will  examine  OSM  quality  evolution  from  a  new  point  of  view.  In 
Section  2,  quality  evaluation  procedures,  as  described  in  the  ISO  quality  frame¬ 
work,  will  be  discussed.  Then,  in  Section  3,  the  methodology  for  understand¬ 
ing  the  evolution  of  OSM  quality  will  be  introduced.  The  central  focus  will  not 
be  on  the  data  themselves  (as  is  usually  the  case  in  most  OSM-based  quality 
studies),  but  rather  on  the  micro-environment  inside  which  OSM  is  evolving. 
To  this  end,  Section  4  will  cover  the  evolution  of  OSM  specifications,  taking 
into  account  a  number  of  different  factors  and  dimensions  that  directly  affect 
the  quality  of  contributions;  in  Section  5,  the  evolution  of  OSM  editors  will  be 
examined,  as  they  are  literally  the  entry  point  for  all  OSM  contributions.  Both 
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Sections  will  provide  a  critical  view  of  the  developments  on  these  two  fronts  and 
of  their  impact  on  the  overall  quality  of  OSM.  The  chapter  will  conclude  with  a 
discussion  of  and  conclusions  on  how  all  of  these  aspects  can  provide  a  useful 
context  for  OSM  quality  evaluation. 

The  purpose  of  this  chapter  is  not  to  provide  measurements  or  quantitative 
reports  regarding  the  quality  of  OSM.  Instead,  the  aim  is  to  highlight  new, 
important  facets  of  OSM  quality  that  have  not  been  considered  to  date  in  what 
is  otherwise  a  rich  and  growing  literature  on  VGI  quality.  This  chapter  supports 
the  idea  that  the  evolution  of  OSM  data  quality  is  closely  related  to  qualitative 
elements  of  the  OSM  micro-environment.  These  include  the  wiki-based  and 
thus  bottom-up  build  and  constantly  changing  specifications,  the  digitisation 
software  (i.e.  the  OSM  editors),  the  mapping  parties,  the  forums,  the  voting 
system,  the  local  and  global  OSM  communities,  the  few,  yet  most  productive, 
contributors,  and  other  seemingly  small  and  unimportant  factors  that  in  real¬ 
ity  determine  to  a  great  extent  the  evolution  of  the  OSM  initiative  and  con¬ 
sequently  the  quality  of  the  data  created.  All  of  these  factors  are  outside  the 
traditional  quality  elements  for  spatial  data  (ISO,  2005)  or  even  the  new  quality 
indicators  suggested  specifically  for  VGI  (see  Antoniou  and  Skopeliti,  2015  for 
an  overview  of  these).  This  chapter  focuses  on  two  of  these  outside  factors: 
OSM  specifications  and  OSM  editors. 


2  Spatial  Data  Quality  Evaluation  Procedures 

This  book  provides  considerable  material  on  the  subject  of  spatial  data  quality. 
For  example,  in  Chapter  7,  Fonte  et  al.  (2017)  discuss  VGI  quality  and  review 
measures  and  indicators  for  this  new  breed  of  data.  In  Chapter  9,  Skopeliti 
et  al.  (2017)  discuss  best  practices  and  methods  for  visualising  VGI  quality, 
while  Chapter  10,  by  Minghini  et  al.  (2017),  discusses  best  practices  for  data 
collection,  including  quality  considerations.  Finally,  in  Chapter  13,  Olteanu- 
Raimond  et  al.  (2017)  examine  the  experience  of  European  National  Mapping 
Agencies  (NMAs)  with  VGI  data  and  discuss  methods  for  obtaining  contribu¬ 
tions  of  high  quality  from  volunteers. 

Both  in  this  book  and  in  the  literature  available  on  the  subject  of  VGI  qual¬ 
ity,  most  VGI  cases  or  examples  come  from  the  OSM  project.  OSM  is  a  prime 
example  of  VGI  as  it  has  managed  to  provide  free,  constantly  updated,  crowd- 
sourced  data  for  the  globe.  However,  when  research  focuses  on  VGI  data  quality, 
scholars  tend  to  examine  some  of  the  spatial  quality  elements  for  a  given  study 
area,  e.g.  cities,  urban  areas  or  nationwide  (Antoniou,  2011;  Girres  and  Touya, 
2010;  Haklay  et  al,  2010;  Jokar  Arsanjani  et  al.,  2015).  The  studies  usually  fol¬ 
low  a  benchmark  evaluation  process,  which  involves  creating  a  copy  of  what  is  a 
continuously  changing  dataset,  and  then  evaluating  this  copy  as  if  it  were  a  static 
dataset.  This  method  gives  insight  into  the  data  quality  at  the  time  when  the 
copy  was  created;  thus,  these  efforts  provide  a  good  understanding  of  selected 
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quality  elements  at  a  given  point  in  time  compared  with  corresponding  authori¬ 
tative  datasets.  However,  spatial  datasets,  and  especially  VGI  ones,  are  not  static 
products  and  hence  time  is  a  critical  factor  that  is  not  often  considered.  The 
starting  point  for  a  spatial  product  is  the  specifications  that  will  be  used  to  create 
the  dataset.  Yet  these  specifications  can  change  over  time  for  both  authorita¬ 
tive  and  VGI  datasets.  In  fact,  the  latter  kind  of  Geographic  Information  (GI)  is 
more  susceptible  to  changes  in  specifications  since  bottom-up  processes  provide 
the  flexibility  for  new  rules  to  be  established  or  existing  ones  deprecated  more 
easily  by  the  community  of  volunteers.  While  the  path  of  evolution  and  change 
in  the  specifications  of  a  product  is  inescapable,  there  is  a  fundamental  differ¬ 
ence  in  how  each  source  of  GI  (i.e.  authoritative  or  VGI)  handles  their  dataset 
life-cycle.  For  example,  authoritative  data,  collected  by  NMAs  or  Commercial 
Mapping  Companies  (CMCs),  usually  follow  a  versioning  system.  Users  of  such 
data  are  notified  that  a  set  of  updates  is  available  or,  more  relevant  to  our  case, 
that  a  new  dataset  has  been  created  based  on  new  specifications.  The  product 
specifications  can  also  be  available  to  the  interested  parties.  A  case  in  point  can 
be  found  in  the  practices  of  the  UK’s  Ordnance  Survey  (OS).  For  the  OS  Mas- 
terMap  product  (OS  2001),  for  example,  OS  provides  a  detailed  document  that 
explains  how  each  physical  entity  is  conceived,  modelled  and  stored  and  thus 
what  accuracy  and  attributes  should  be  expected.  The  important  point  here  is 
that  while  a  new  dataset  is  developed,  or  during  the  migration  from  one  form  of 
specification  to  another,  the  datasets  are  not  accessible  to  the  users.  This  process 
takes  place  in-house,  and  only  when  the  whole  process  has  been  concluded  are 
the  data  available  for  use.  This  is  in  contrast  with  what  takes  place  with  VGI.  In 
a  sense,  VGI  datasets  are  following  one  of  the  main  characteristics  of  Web  2.0 
(O’Reilly,  2007),  i.e.  perpetual  beta.  This  small  phrase  is  usually  applied  to  soft¬ 
ware  development  cycles,  and  means  that  there  are  no  versioning  cycles  but 
rather  a  continuous  effort  of  software  development  so  as  to  match  evolving  user 
needs;  here  this  notion  spills  over  to  datasets,  and  OSM  is  an  excellent  example 
for  monitoring  this.  The  perpetual  editing  of  and  changes  to  OSM  specifications 
has  made  OSM  evolve  from  a  dataset  with  a  handful  of  layers  and  physical  fea¬ 
tures  to  an  extremely  detailed  dataset,  in  many  cases  far  more  detailed  than  any 
NMA  or  CMC  dataset.  The  difference  between  VGI  and  authoritative  data  is 
that  in  VGI  while  the  evolution  of  datasets  takes  place  the  actual  data  are  avail¬ 
able  without  any  guarantees  or  indications  regarding  the  state  or  compliance  of 
each  feature  in  relation  to  a  specification’s  version.  It  is  not  difficult  to  imagine 
that  this  process,  while  it  has  many  advantages,  can  create  a  series  of  inconsist¬ 
encies  and,  in  fact,  deteriorate  the  overall  quality  of  the  data. 

Thus,  while  specification  improvements  might  eventually  be  a  necessary  step 
for  a  better,  more  inclusive,  detailed  and  meaningful  dataset,  during  the  transi¬ 
tion  time,  the  dataset  is  bound  to  suffer  from  inconsistencies,  mixed  feature 
versions  and  mixed  typologies  that  exist  in  former  and  latter  specifications. 
This  is  even  more  likely  if  there  is  a  perpetual  change  in  specifications  without 
any  rigorous  provision  on  how  to  manage  the  data  transition  and  compliance. 
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Returning  the  discussion  to  quality  evaluation  processes,  benchmark  com¬ 
parisons  are  usually  chosen  not  because  they  are  necessarily  the  best  way  to 
evaluate  the  data  quality  of  a  VGI  dataset  but  because  they  are  the  most  prac¬ 
tical  to  perform  and  report.  ISO  (2005)  explains  that  benchmark  procedures 
should  be  based  on  the  establishment  of  a  suitable  reporting  frequency.  Spo¬ 
radic  and  non-systematic  evaluations,  although  perfectly  acceptable  in  an  aca¬ 
demic  environment,  do  not  provide  a  clear  view  of  OSM  quality,  or  of  the  qual¬ 
ity  of  any  other  VGI  source.  To  this  end,  a  different  approach  suggested  by  the 
ISO  quality  framework  is  to  evaluate  constantly  changing  datasets,  as  is  the  case 
of  OSM  data,  using  a  continuous  process.  Here,  the  starting  point  could  again 
be  a  benchmark  test,  but  then  there  should  be  a  continuous  evaluation  of  the 
updates  and  of  the  impact  that  these  updates  might  have  on  the  overall  data¬ 
set.  However,  there  is  no  provision  made  for  specification  migration,  perhaps 
because  this  sense  of  perpetual  editing  is  not  applicable  to  authoritative  data. 


3  Methodology 

To  evaluate  OSM  evolution  from  a  quality  point  of  view,  we  need  to  consider 
what  process  to  use.  A  way  forward  is  to  follow  one  of  the  two  ISO  sugges¬ 
tions.  This  means  that  we  need  to  develop  a  benchmarking  method  that  will  be 
able  to  examine  an  instance  of  the  OSM  data  against  an  authoritative  dataset 
on  a  regular  basis  (e.g.  weekly,  monthly,  etc.).  For  a  number  of  reasons,  this 
is  not  straightforward.  First,  there  is  no  global-scale  authoritative  dataset  that 
could  play  the  role  of  the  reference  data.  Even  if  such  datasets  were  available 
for  academic  research,  it  is  not  clear  which  one  would  be  more  detailed  and  at 
which  places.  For  example,  Vandecasteele  and  Devillers  (2015)  report  that  in 
many  places  OSM  is  far  more  detailed  than  any  authoritative  dataset  available. 
Moreover,  such  an  approach  would  require  the  implementation  of  considerable 
amounts  of  brute  force  computing  on  a  regular  basis.  This  approach  would  be 
possible  in  the  context  of  confined  academic  experiments  that  would  test  either 
a  few  quality  elements  at  a  national  level  or  all  the  quality  elements  for  small 
areas,  but  it  would  be  difficult  to  achieve  and  maintain  both  globally  and  regu¬ 
larly.  The  same  applies  to  a  continuous  evaluation  process,  although  the  evalu¬ 
ation  of  the  quality  of  OSM  updates  is  a  more  straightforward  task,  given  the 
fact  that  OSM  provides  regular  updates  in  separate  files  and  for  various  time 
intervals.  However,  the  frequency  of  updates  is  inversely  related  to  the  number 
of  changes,  so,  for  practical  reasons,  evaluating  the  data  quality  continuously  is 
beyond  the  means  of  most  NMAs  or  CMCs. 

Hence,  an  alternative  approach  is  taken  here,  which  is  based  on  the  evalua¬ 
tion  of  factors  that  directly  affect  OSM  quality  but  are  currently  not  studied  by 
researchers,  i.e.  a  study  of  the  OSM  specifications.  The  value  of  specifications 
in  VGI  has  been  discussed  by  Brando  and  Bucher  (2010)  and  by  Brando  et  al. 
(2011).  The  form  of,  and  the  rules  included  in,  a  product’s  specification,  at  any 
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given  point  in  time,  is  fundamental.  This,  along  with  metadata,  is  the  starting 
point  that  allows  potential  users  to  understand  the  usability  of  the  data.  Moni¬ 
toring  and  documenting  the  changes  that  have  taken  place  in  the  specification 
of  OSM  over  time  could  add  another  tool  to  the  toolbox  used  for  OSM  quality 
evaluation,  and  could  provide  the  necessary  context  for  some  of  the  academic 
efforts  in  this  field. 

Moreover,  this  approach  will  be  coupled  with  an  evaluation  of  the  evolution 
of  OSM  editors.  OSM  contributions  are  uploaded  through  a  number  of  OSM 
editors  that  have  been  developed  and  updated  by  the  OSM  community  itself. 
The  editing  tools  and  the  overall  functionality  of  the  editor,  and,  more  impor¬ 
tantly,  the  editor’s  conformance  to  the  wiki  specifications,  play  a  significant  role 
in  the  kind  of  edits  submitted  and  consequently  in  the  quality  of  the  data  con¬ 
tributed. 


4  Evolution  of  OSM  Specifications 

4.1  General  Changes  to  the  Main  OSM  wiki  Page 

OSM  specifications  are  described  in  a  wiki-based  process.  The  starting  point  is 
a  MediaWiki1  web  page  titled  Map  Features’  (OpenStreetMap,  2016).  This  page 
lists  all  of  the  physical  features  that  should  be  included  in  the  OSM  database, 
along  with  some  of  the  basic  attributes  that  should  describe  each  feature.  The 
OSM  community  decides  what  is  added  or  removed  from  this  list  through  a 
voting  system.  In  the  OSM  world,  the  features  are  called  keys  and  the  attributes 
values.  In  the  Map  Features’  web  page,  the  physical  features  are  grouped  into 
categories  and  sub-categories  depending  on  their  semantics  and  nature.  For 
each  feature,  additional  information  is  available,  such  as  the  type  of  geometry 
that  should  be  used  (i.e.  node,  way  or  area),  comments  on  what  each  feature 
represents,  assisting  documentation  from  Wikipedia,  a  photograph  that  shows 
how  the  feature  appears  on  the  OSM  map  and  a  photograph  that  functions  as  a 
photo-interpretation  key.  The  latter  photograph  helps  the  contributors  to  better 
understand  how  to  assign  features  on  the  ground  to  the  OSM  nomenclature. 
Moreover,  each  key/tag  combination  is  further  explained  in  other  wiki  pages, 
which  themselves  include  more  details  about  the  way  the  feature  should  be 
digitised,  additional  attributes  that  could  further  describe  the  feature,  and  the 
possible  combinations  of  the  attributes. 

For  web  pages  created  with  MediaWiki,  it  is  possible  to  access  the  pages’  his¬ 
tory  and  trace  back  what  changes  have  been  made,  at  which  time  and  by  whom. 
Moreover,  a  short  summary  of  the  changes  is  available,  along  with  a  classifica¬ 
tion  of  whether  a  change  was  a  minor  edit  or  not  (computed  based  on  whether 
the  person  who  performs  the  edit  has  marked  the  edit  as  minor  or  not2).  Thus, 
in  order  to  understand  how  this  (quasi)  specification  of  OSM  has  evolved,  we 
examined  how  the  Map  Features’  page  has  changed  over  time.  At  the  time  of 
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writing  (May  2016),  there  were  847  versions  of  this  wiki  page  alone,  with  the 
first  one  dating  back  to  20  December  2005.  This  means  that  a  major  or  minor 
edit  has  taken  place  approximately  every  4.4  days  since  on  average. 

The  first  point  of  analysis  was  to  examine  when  each  version  was  released. 
Figure  1  shows  the  number  of  changes  per  year  and  the  corresponding  percent¬ 
age.  This  provides  a  good  understanding  of  whether  OSM  specifications  are 
constantly  changing  or  if  there  are  any  emerging  patterns.  Figure  1  shows  that 
most  of  the  changes  (88%)  have  taken  place  in  the  first  three  years  of  OSM’s 
life,  while,  from  2011  onwards,  each  years  overall  changes  do  not  exceed  2% 
of  the  total  of  changes.  This  is  an  interesting  observation  as  it  paints  a  picture 
of  a  crowdsourced  product  that  has  matured  extremely  fast  compared  to  the 
breadth  and  length  of  its  aims  (i.e.  to  create  and  distribute  free  geographic  data 
for  the  world’3). 

The  next  step  is  to  analyse  the  importance  of  these  changes.  Taking  into 
account  the  automatic  assignment  of  an  edit  into  minor  or  not,  we  explored 
when  and  how  many  edits  take  place  each  year  for  each  kind  of  change.  It  is 
understandable  that  the  number  of  characters  changed  cannot  be  an  entirely 
safe  measure  of  a  change’s  importance.  However,  it  is  considered  as  a  good  indi¬ 
cator  that  can  give  a  basic  understanding  of  the  amount  of  work  put  forward  in 
every  change.  Figure  2  presents  the  percentage  of  major  and  minor  changes  per 
year.  Despite  being  a  fast  maturing  product  as  noted  above,  major  changes  in 
the  specifications  take  place  constantly.  This  observation  should  be  considered 
in  combination  with  that  of  the  flexibility  provided  to  contributors,  which  is  in 
line  with  the  openness  and  spirit  of  inclusiveness  that  characterises  the  OSM 
project.  For  example,  in  the  wiki-forums  it  is  explicitly  stated  that  the  OSM 
community  might  introduce  best  practices,  guidelines  or  even  deprecated  fea¬ 
tures  and  attributes  and  that  nothing  is  banned.  Contributors  are  free  to  add 
whatever  they  believe  will  better  describe  the  physical  world. 

Thus,  inconsistencies  and  mismatches  in  the  keys  and  values  used  can  come 
from  both  a  ‘formal’  change  in  the  specifications  and  the  free  key/tag  com¬ 
bination  choice  available  to  users.  Interestingly,  in  the  case  when  changes  in 
the  specification  are  introduced,  automatic  correction  of  the  existing  features  is 
highly  discouraged;  the  rules  state:  ‘Under  no  circumstances  should  you  auto¬ 
matically  (or  semi-automatically)  change  “deprecated”  tags  to  something  else 
in  the  database  on  a  large  scale  without  conforming  to  the  Automated  Edits 
code  of  conduct.  Any  such  edits  will  be  reverted’4. 


4.2  Development  of  Feature  Specifications 

The  analysis  so  far  has  provided  an  initial  overview  of  OSM  specification’s 
development  over  time.  Now  the  focus  turns  to  the  actual  changes  that  took 
place.  For  practical  reasons,  a  selection  of  some  of  the  847  ‘Map  Features’ 
page  versions  had  to  be  made  in  order  to  use  them  for  comparison.  The 
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per  year  tor  the  OSM  Map  features  wiki  page. 
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minor  changes  to  the  Map  features  wiki  page. 
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versions  selected  were  those  closest  to  the  end  of  each  calendar  year  from 
2006  up  until  2015.  Then,  in  order  to  better  monitor  the  development  of  the 
specification,  we  examined  the  alterations  that  took  place  in  four  dimen¬ 
sions:  the  vertical,  horizontal,  in-depth  and  internationalisation  dimensions. 
All  four  dimensions  are  closely  related  to  the  OSM  data  (in  fact  are  different 
aspects  of  the  OSM  content)  and  thus  can  provide  a  helpful  point  of  view 
in  the  effort  to  assess  data  quality.  We  define  the  vertical  dimension  as  the 
number  of  physical  features  described  in  the  wiki  page,  while  the  horizon¬ 
tal  dimension  is  the  information  available  for  each  feature  (i.e.  keys,  values, 
comments,  rendering  instructions  and  photographs;  all  of  these  are  help¬ 
ful  in  guiding  the  contributors  to  correctly  capture  physical  features).  The 
in-depth  dimension  is  considered  to  be  the  extra  information  available  for 
each  feature:  both  keys  and  tags  are  usually  further  analysed  in  separate  wiki 
pages  where,  for  example,  possible  key/value  combinations  or  more  detailed 
instructions  about  their  proper  use  are  provided.  Finally,  the  inter  nationali¬ 
sation  dimension  is  defined  as  the  availability  of  the  specification  in  different 
languages.  In  general,  wiki  pages  can  be  translated  and  exist  simultaneously 
in  different  languages,  and  thus  can  be  read  and  accurately  comprehended 
by  many  people  around  the  world;  similarly,  OSM  specifications  need  to  be 
understood  by  the  largest  possible  audience  in  order  to  successfully  achieve 
the  aim  of  creating  a  global  map. 

A  number  of  illustrative  examples  are  provided  for  each  dimension.  These 
examples  aim  to  provide  a  picture  of  the  changes  that  have  taken  place  in  the 
OSM  specification  over  time  and  help  researchers  understand  both  the  volatility 
in  the  contributions  and  the  quality  that  comes  from  the  micro-environment  in 
which  OSM  is  developing. 


4.2. 1  Changes  in  the  Vertical  Dimension 

One  interesting  aspect  in  the  evolution  of  the  OSM  specification  is  to  examine 
how  the  major  OSM  categories  have  evolved.  This  vertical  examination  of  the 
‘Map  Features’  page  gives  a  sense  of  how  the  nomenclature  of  OSM  has  changed 
through  the  addition  and  removal  of  categories  and  features  in  the  list  of  enti¬ 
ties  that  OSM  uses  to  describe  the  world.  Table  1  shows  the  number  of  active 
categories  at  the  end  of  each  calendar  year;  moreover,  it  shows  how  many  cat¬ 
egories  have  been  added  or  removed  compared  to  the  previous  year. 

It  can  be  seen  that  major  additions  took  place  during  2008,  where  48  cat¬ 
egories  were  added.  From  then,  new  feature  categories  are  added  almost  every 
year,  but  interestingly  there  are  also  categories  that  have  been  removed  as 
independent  typologies  in  the  nomenclature  of  OSM  and  have  been  merged 
with  others.  Examples  of  the  categories  added  include  power  and  shop  in  2007, 
facilities,  education  and  transportation  in  2008,  geological  in  2009,  emergency. 
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Table  1:  Additions  and  removals  of  OSM  categories  from  the  Map  Features 
wiki  page. 


2006 

2007 

2008 

2009 

2010 

2011 

2012 

2013 

2014 

2015 

Categories 

Present 

28 

32 

78 

83 

90 

97 

91 

93 

96 

93 

Categories 

Removed* 

0 

2 

0 

0 

0 

4 

0 

0 

3 

Categories 
With  a  Name 
Change* 

0 

0 

0 

0 

0 

2 

0 

0 

0 

Categories 

Added* 

4 

48 

5 

7 

7 

0 

0 

3 

0 

* compared  to  the  previous  year 

medical  rescue  and  firefighters  in  2010,  commercial  and  civil  amenity  in  2011 
and  traffic  calming  in  2014.  Examples  of  removals  include  the  categories  of 
cycleway,  tracktype,  abutters  and  naming  in  2012. 

Apart  from  the  changes  in  the  major  OSM  categories,  there  have  also  been 
changes  recorded  to  the  features  in  each  category.  Tables  2,  3  and  4  present 
illustrative  examples  of  how  selected  features  have  evolved  over  time.  More 
specifically,  Table  2  shows  the  sub-categories  of  Highways  and  Places  as  well  as 
the  number  of  distinct  features  included  in  each  of  these  sub-categories.  It  can 
be  seen  that,  for  these  two  major  categories,  which  in  fact  include  all  road  net¬ 
work  and  all  gazetteer  data,  there  have  not  been  any  changes  since  2008.  This 
does  not  mean  that  there  have  not  been  changes  in  the  wiki  pages  that  further 
explain  the  attributes  of  each  distinct  feature,  but  that  at  least  at  this  high  level 
the  nomenclature  has  been  stable  since  2008.  The  flip  side  is  that  while  the 
geometry  (i.e.  positional  accuracy)  of  the  road  network  or  places  might  still  be 
correct,  since  they  have  not  been  updated  since  2007  it  is  likely  that  they  might 
suffer  from  attribution  inconsistencies  that  affect  their  thematic  accuracy  and 
logical  consistency. 

Table  3  shows  how  the  Buildings  category  has  evolved.  Here  again,  at  the  sub¬ 
categories  level  and  in  terms  of  the  number  of  features  per  sub-category,  Build¬ 
ings  have  been  stable  since  2011.  The  interesting  point  here  is  that  this  major 
category,  which  includes  the  footprints  of  buildings,  was  introduced  in  OSM  in 
2011.  Thus,  areas  that  have  not  been  updated  since  2011,  either  because  there 
was  a  bulk  upload  in  the  past  or  because  the  area  was  mapped  by  a  very  produc¬ 
tive  user  that  did  not  return  to  update  it  (for  more,  see  Antoniou  and  Schlieder, 
2014),  would  probably  not  have  this  type  of  feature,  since  capturing  buildings 
was  out  of  the  scope  of  OSM  before  2011. 
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Table  2:  The  number  of  sub-categories  and  distinct  features  (keys)  included  in 
the  Highways  and  Places  main  OSM  categories  from  2006  to  2015. 


Primary 

Feature  Sub  Category 

2015 

2014-2009 

2008 

2007 

2006 

Feature 

Category 

Distinct  Features  (Keys) 

Roads 

8 

8 

Link  roads 

5 

5 

Special  road  types 

6 

6 

Paths 

4 

4 

When  sidewalk  (or 
pavement)  is  tagged  on 
the  main  roadway 

1 

No  change 

1 

47* 

42* 

When  cycleway  is  drawn  as 
its  own  way 

1 

1 

Cycleway  tagged  on  the 
main  roadway  or  lane 

8 

8 

Highways 

Lifecycle 

2 

2 

Attributes 

27 

27 

Other  highway  features 

18 

18 

Administratively  declared 
places 

7 

7 

Populated  settlements, 
urban 

7 

No  change 

7 

15* 

15* 

Populated  settlements, 
urban  and  rural 

6 

6 

Places 

Other  places 

6 

6 

Additional  attributes 

6 

6 

*  Different  groupings  and  typologies  used  for  OSM  Keys 

Finally,  Table  4  shows  the  changes  in  the  Additional  Properties  category.  This 
category  was  introduced  in  2012  as  a  successor  to  the  Naming  category,  and 
includes  important  features  and  information  such  as  Addresses,  Annotation  and 
Name.  However,  it  can  be  seen  that  there  are  frequent  and  important  changes 
in  OSM  typology  that  make  it  difficult  for  contributors  to  follow  all  the  specifi¬ 
cation’s  provisions.  For  example,  Addresses  did  not  exist  until  2008;  it  was  later 
added  to  the  Naming  category,  and  then,  in  2012,  it  was  re-assigned  to  Addi¬ 
tional  Properties.  Similarly,  Place  was  removed  from  the  Additional  Properties 
category  and  formed  a  new  one. 
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Table  4:  The  number  of  sub-categories  and  distinct  features  (keys)  included  in  the  Additional  Properties  main  OSM  category  from 
2006  to  2015. 
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2006 

Distinct  Features  (Keys) 

Did  not  exist 

k 

k 

Os 

VO 

k 

k 

© 

' — 1 

00 

★ 

k 

k 

00 

<N 

to 

Did 

not 

exist 

2007 
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*■ 
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* 

Vo 

*— H 

(N 

* 
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k 
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© 

23 

CO 

k 
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CO 

<N 
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© 

32 

Did  not  exist 

2009 

k 
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VO 

■k 

k 

co 

23 

CO 

* 

★ 

vo 

CO 

(N 

k 

k 
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32 

2010 
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* 
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32 

2011 

★ 
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23 
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k 

★ 

vo 

CO 

(N 

★ 

* 

© 

32 

★ 

(N 

© 

<N 

o 

VO 

CO 

23 

CO 

36 

<N 

40 

New  Primary  Feature  Category 

2013 

o 

VO 

CO 

23 

CO 

36 

(N 

40 

2014 

o 

VO 

CO 

23 

CO 

36 

(N 

40 

2015 

o 

VO 

CO 

23 

CO 

36 

(N 

40 

Feature  Sub  Category 

Addresses 

Tags  for  individual 
houses 

For  countries  using 
hamlet,  subdistrict, 
district,  province,  state 

Tags  for  interpolation 
ways 

Annotation 

Name 

Properties 

References 

Restrictions 

Places 

Editor  keys 

Primary  Feature  Category 

sarjjadojd  jBuoijTppy 
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Apart  from  the  distinct  feature  keys  that  have  been  added  or  removed  over 
time,  major  changes  in  how  the  OSM  community  models  the  world  took  place 
in  2008  and  2012.  In  2006,  the  world,  according  to  OSM,  was  divided  into  a 
number  of  major  categories:  Physical,  Non  Physical,  Abutters,  Accessories,  Prop¬ 
erties,  Restrictions,  Naming  and  Annotation.  During  the  next  year,  these  major 
categories  were  further  enriched  with  sub-categories,  and  then,  in  the  following 
year,  there  was  another  typology.  Indeed,  in  2008  there  were  only  three  major 
categories:  Physical,  Non-Physical  and  Naming.  The  first  category  went  from 
including  17  sub-categories  to  including  59,  while  the  second  included  as  sub¬ 
categories  all  the  major  categories  of  2007  apart  from  those  specifically  related 
to  the  naming  process  (e.g.  Name,  References,  Places,  Annotation,  etc.),  which 
were  assigned  to  the  last  main  category. 

In  2012,  the  features  were  re-assigned  into  two  new  major  categories:  Pri¬ 
mary  Features  and  Additional  Properties.  The  Physical  sub-categories  were 
added  to  the  former  category,  but  it  also  included  sub-categories  from  the 
Non-Physical,  such  as  Route,  Boundary  and  Sport.  The  latter  category  remained 
with  six  main  sub-categories:  Addresses,  Annotation,  Name,  Properties,  Refer¬ 
ences  and  Restrictions.  Also,  in  2012,  some  major  changes  took  place  regarding 
the  grouping  of  the  physical  entities  in  various  sub-categories  and  classes.  For 
example,  the  entity  Places,  which  used  to  be  a  class  under  the  Naming  sub¬ 
category  in  201 1,  became  an  independent  sub-category  in  2012  below  the  Pri¬ 
mary  Features,  while  the  Naming  sub-category  was  assigned  to  the  Additional 
Properties  category.  Furthermore,  during  the  study  period  (i.e.  2006-2015), 
considerable  volatility  was  recorded  in  some  sub-categories.  A  case  in  point  is 
the  Naming  sub-category,  which  listed  3  features  in  2007, 9  features  in  2008  and 
13  features  in  2009  (before  it  was  split  again  in  2012). 

While  these  are  only  some  illustrative,  and  perhaps  confusing,  examples  of 
the  changes  recorded  in  the  OSM  specification,  two  things  are  evident  with 
respect  to  the  commitment  of  contributors.  First,  for  OSM  contributors  that 
have  been  consistently  contributing  during  the  entire  period,  it  should  have 
been  difficult  to  meticulously  follow  all  of  the  changes;  thus,  it  should  not  come 
as  a  surprise  that  even  experienced  users  might  have  introduced  errors  and 
inconsistencies  in  the  data.  On  the  other  hand,  there  are  either  occasional  con¬ 
tributors  or  contributors  that  have  just  a  short  active  period  and  never  contrib¬ 
ute  again;  for  both  of  these  types  of  contributors,  the  best  case  scenario  would 
be  that  contributors  have  consulted  the  active  specification  at  a  specific  point  in 
time  and  collected  the  data  based  on  this  version.  In  the  worst  case,  the  contri¬ 
butions  were  based  on  previous  knowledge  and  understanding  of  the  specifica¬ 
tion.  In  any  case,  and  taking  into  account  the  fact  that  automatic  corrections  are 
discouraged,  it  is  highly  likely  that  a  considerable  number  of  contributions  are 
out  of  date  in  terms  of  specification  compliance.  This  also  puts  quality  frame¬ 
works  that  are  based  on  contributor  evaluation  under  fresh  scrutiny  (see  e.g. 
DAntonio  et  al.,  2014;  van  Exel  et  al,  2010). 
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4.2.2  Changes  in  the  Horizontal  Dimension 

The  ‘Map  Features’  page,  apart  from  the  addition  and  removal  of  new  cate¬ 
gories,  sub-categories  and  features,  has  also  changed  in  terms  of  the  available 
information  for  each  of  these  categories  and  features.  While  modest  changes 
have  been  recorded  compared  to  the  vertical  dimension,  this  horizontal  dimen¬ 
sion  still  plays  a  significant  role  in  the  rules  and  information  that  volunteers  are 
equipped  with  when  collecting  data  and  contributing  to  the  project. 

Two  illustrative  examples  are  presented  to  show  the  evolution  in  the  hori¬ 
zontal  dimension.  The  first  example  (Figures  3  and  4)  shows  one  of  the  major 
physical  entities:  Highways.  Even  from  the  early  days  of  the  OSM  project,  it  was 
made  clear  that  volunteers  needed  as  much  information  as  possible  in  order 
to  be  able  to  unequivocally  distinguish  between  and  capture  various  physical 
entities.  However,  the  actual  information  available  was  not  enough  for  safely 
guiding  volunteers.  For  example,  at  the  end  of  2006  (Figure  3),  the  main  fea¬ 
ture-attribute  combination,  which  is  a  description  of  what  each  feature  name 
represents  and  how  features  are  portrayed  on  the  OSM  map,  became  available. 
Thus,  in  practice,  a  volunteer  could  use  only  the  short  description  as  a  guide  for 
interpreting  the  entity  before  digitising  and  assigning  it  to  the  correct  category. 
For  more  information,  the  volunteer  would  have  had  to  follow  a  link  attached 
to  the  Highway  key.  At  the  end  of  2006,  a  small  number  of  photographs  and 
basic  information  was  available  so  as  to  guide  the  contributors.  It  is  obvious 
that  the  incomplete  description  of  each  feature,  although  it  does  not  stop  con¬ 
tributors  collecting  the  data,  makes  the  collection  error  prone  in  terms  of  the¬ 
matic  and  logical  consistency,  and  especially  so  at  a  time  when  satellite  imagery 
was  not  so  common  and  was  of  low  resolution  when  it  was  available. 

In  contrast,  Figure  4  shows  the  current  specification  section  of  Highways. 
The  available  information  for  each  physical  feature  has  expanded  to  include 
a  photo-interpretation  key  that  can  more  easily  guide  contributors.  Further¬ 
more,  apart  from  the  link  attached  to  the  highway  key,  which  links  to  a  page 
more  detailed  than  the  2006  one,  each  value  also  has  its  own  wiki  page  (see 
also  Section  4.2.3).  In  these  pages,  more  details  are  provided  regarding  what 
is  preferable  for  the  volunteers  to  follow  and  what  to  avoid.  Moreover,  a  wide 
list  of  possible  key-value  combinations  is  provided,  with  explanations  and 
examples. 

A  similar  example  is  provided  by  contrasting  the  2006  and  2015  wiki  pages 
on  aerialways  (Figures  5  and  6).  As  this  feature  is  not  one  of  the  fundamental 
entities  of  a  base  map,  there  was  only  a  basic  description  of  it  in  2006  (Figure  5; 
note  also  that  the  structure  of  the  table  is  different  from  that  of  the  table  for  the 
highways  of  2006).  In  contrast,  in  2015  (Figure  6),  the  available  information  is 
as  complete  as  that  of  the  highways.  Moreover,  the  comments  are  supported  by 
Wikipedia  articles  and  some  basic  instructions  are  given  about  the  key-value 
information. 


For  linear  elements  (segments  and  ways)  the  highway  tag  takes  one  of  the  following  values 
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Fig.  3:  Part  of  the  Map  Features  wiki  page  (end  of  2006)  that  specifies  various  types  of  roads  that  should  be  captured  (includes 
key,  value,  comments  and  default  rendering  of  the  entity  on  the  OSM  map).  @OpenStreetMap  contributors. 


This  is  used  to  describe  roads  and  footpaths.  For  an  introduction  on  its  usage  see  the  page  titled  Highways.  See  the  page  titled  Restrictions  for  an  introduction  on  access 
limitations  by  vehicles  type,  time,  day,  load  and  purpose,  etc. 
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photo-interpretation  key).  @OpenStreetMap  contributors. 


Feature  Feature  type  Key  Value  Comments 
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Aerialway  for  more  information  on  the  usage  of  these  tags. 
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photo-interpretation  key).  @OpenStreetMap  contributors. 
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We  have  used  these  two  examples  to  highlight  the  evolution  of  the  OSM 
specification.  From  2006  to  2015,  each  feature  followed  its  own  pace  regarding 
the  available  information  provided  to  the  OSM  community.  Thus,  the  quality  of 
the  contributions  for  each  feature  could  have  varied  accordingly.  The  mobilisa¬ 
tion  of  thousands  of  enthusiastic,  yet  mostly  inexperienced,  contributors  has 
inevitably  led  to  ‘learning-by-doing’  in  the  face  of  incomplete  and  changing 
specifications. 


4.2.3  Changes  in  the  In-depth  Dimension 

The  in-depth  dimension  of  the  Map  Features’  has  been  briefly  discussed  in 
the  previous  section.  It  refers  to  the  available  information  for  each  key/value 
combination  and  the  attribution  process  that  contributors  should  follow.  As 
explained,  each  physical  entity  has  developed  independently  and  the  level  of 
detail  might  vary  considerably  at  different  time  periods.  Here  we  provide  one 
example  to  illustrate  changes:  unclassified  roads.  Figure  7  shows  the  unclassi¬ 
fied  roads  wiki  page  at  the  end  of  2008,  which  included  the  basic  information 
regarding  the  mapping  of  the  highway=unclassified  combination. 

In  contrast,  the  same  page  at  the  end  of  2015  (Figure  8)  includes  more  detailed 
information  about  the  preferable  attributes  that  can  be  assigned  to  this  entity 
as  well  as  instructions  about  how  to  map  the  entity,  when  it  is  applicable,  situ¬ 
ations  where  other  tags  should  be  used,  examples  of  determining  applicability 
and  even  disambiguation  instructions  when  the  public/private  status  is  unclear. 


4.2.4  Changes  in  Internationalisation 

Right  from  the  beginning  of  the  project,  OSM  aspired  to  create  a  global  and  free 
map.  It  is  obvious  that  this  could  not  be  achieved  without  global  participation. 
When  examining  the  internationalisation  of  OSM,  we  can  see  that  the  ‘Map 
Features’  page  is  currently  (i.e.  in  May  2016)  available  in  49  languages  (Table  5). 
Although  there  has  been  no  calculation  regarding  the  percentage  of  the  global 
population  covered,  it  is  clear  that  the  basic  rules  of  OSM  can  be  understood  by 
a  broad  audience.  However,  this  was  not  always  the  case.  Until  the  end  of  2009, 
the  Map  Features’  page  was  only  available  in  English.  From  the  end  of  2010, 
however,  until  2015,  the  number  of  available  languages  was  45. 

Apart  from  the  Map  Features’  page,  which  is  the  starting  point  of  the  specifi¬ 
cation,  there  are  documentation  pages  for  each  OSM  key  and  value  in  order  to 
better  explain  the  use  cases  and  the  most  appropriate  combinations.  These  pages 
should  also  be  available  in  as  many  languages  as  possible.  However,  their  avail¬ 
ability  varies  and,  in  general,  there  are  considerably  fewer  available  languages 
than  for  the  Map  Features’  page.  For  example,  the  key  aerialway  is  available 
in  10  languages  (cestina,  deutsch,  english,  italiano,  magyar,  polski,  portugues 
do  Brasil,  pyccKnh,  °l-=}-0|  and  B  ifclp)  while  the  combination  amenity=cafe 
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Fig.  7:  The  wiki  page  that  specifies  the  use  of  the  highway=unclassified  combination  (end  of  2008).  @OpenStreetMap  contributors. 
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Fig.  8:  The  wiki  page  that  specifies  the  use  of  highway^un classified  combination  (end  of  2015).  @OpenStreetMap  contributors. 
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Table  5:  Available  languages  for  the  Map  Features  wiki  page  (as  of  May  2016). 


# 

Language 

# 

Language 

# 

Language 

# 

Language 

1 

asturianu 

14 

Hrvatski 

27 

Romana 

40 

E\Xr|VLKd 

2 

azarbaycanca 

15 

Islenska 

28 

Shqip 

41 

JitboT'jjcpo 

3 

Bahasa 

indonesia 

16 

Italiano 

29 

Slovencina 

42 

3ju5]y? 

4 

bosanski 

17 

kreyol  ayisyen 

30 

Slovenscina 

43 

5 

catala 

18 

kreyol 

gwadloupeyen 

31 

Suomi 

44 

6 

cestina 

19 

Latviesu 

32 

Svenska 

45 

(ffift) 

7 

dansk 

20 

Lietuvii} 

33 

Tieng  Viet 

46 

*£  (*SS) 

8 

Deutsch 

21 

Magyar 

34 

Tiirkqe 

47 

rr-nv 

9 

eesti 

22 

nederlands 

35 

cpncKn/srpski 

48 

10 

english 

23 

norsk  bokmal 

36 

B’MrapcKM 

49 

11 

espanol 

24 

Polski 

37 

MaKeflOHCKM 

12 

esperanto 

25 

Portugues 

38 

PyCCKMM 

13 

franqais 

26 

portugues  do 
Brasil 

39 

YKpaiHCbKa 

is  available  in  12  languages  (cestina,  deutsch,  eesti,  english,  franqais,  italiano, 
nederlands,  portugues  do  Brasil,  pyccKMM,  e\A.r|viKd,  0  ^Ip, 


5  Evolution  of  OSM  Editors 
5.1  The  Usage  of  the  OSM  Editors 

An  important  component  of  the  micro-environment  of  OSM  is  the  editing 
tools.  The  OSM  editors  used  by  volunteers  play  an  important  role  as  they  pri¬ 
marily  dictate  the  type  and  quality  of  the  data  contributed.  For  example,  an 
embedded  functionality  in  an  OSM  editor  can  direct  the  volunteer  to  or  avert 
them  from  specific  choices  that  can  improve  or  deteriorate  the  quality  of  the 
contribution.  There  are  currently  a  large  number  of  OSM  editors  available  for 
various  media,  from  online  browser  editors  (e.g.  iD  and  Potlatch  2),  to  desktop 
and  offline  editors  such  as  JOSM  and  Merkaartor,  to  GIS  software  add-ons,  e.g. 
for  QGIS  and  ArcGIS,  through  to  editors  for  mobile  devices,  like  the  Vespucci 
and  OsmAndFrom.  By  reviewing  the  history  of  the  OSM  wiki  pages  dedicated 
to  editors5,  it  becomes  clear  that  the  number  of  available  editors  has  increased 
as  the  project  has  developed  (Figure  9). 
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Fig.  9:  Number  of  OSM  editors. 


The  variety  and  the  large  number  of  OSM  editors  currently  in  use  indicates 
the  degree  of  interest  in  the  OSM  project.  However,  this  wide  range  of  OSM 
editors  diversifies  the  data  sources  and  can  possibly  affect  the  coherence  and 
homogeneity  of  the  contributions.  Indeed,  at  the  time  of  writing  (i.e.  May 
2016),  there  were  27  editors  available  for  the  OSM  community  to  choose  from. 
This  freedom,  while  in  line  with  the  ideology  of  a  crowdsourced  project,  might 
undermine  the  overall  effort  for  a  usable  dataset  of  high  quality.  However,  the 
flip  side  of  this  observation  might  reside  in  the  penetration  that  selected  edi¬ 
tors  have  in  the  OSM  community.  Indeed,  by  examining  the  statistics  from 
the  OSM  wiki  pages6  regarding  the  most  popular  editors,  a  more  encouraging 
picture  is  painted.  By  using  the  number  of  changesets  as  a  criterion  for  the 
years  2009  to  2015  (Figure  10),  it  can  be  seen  that  the  most  popular  editors  in 
2015  are  iD,  JOSM  and  Potlatch  2.  An  OSM  changeset  is  a  group  of  changes 
made  by  a  single  user  over  a  short  period  of  time.  One  changeset  might  include 
a  number  of  edits  (see  below)  such  as  the  addition  of  new  elements  and  tags  or 
a  change  in  values. 

While  the  OSM  community  seems  to  have  settled  on  using  primarily  3  out 
of  the  27  editors  available,  the  findings  in  Figure  10  raise  concerns  regarding 
the  quality  and  homogeneity  of  the  contributions  submitted  with  other  editors 
in  the  past.  For  example,  Potlach  1,  which  used  to  be  one  of  the  most  popular 
editors  in  2009,  is  now  abandoned,  and  Potlach  2  has  been  completely  rewrit¬ 
ten.  Similarly,  Merkaartor,  which  provided  4-5%  of  changesets  each  year  from 
2009  until  2011,  has  now  almost  entirely  disappeared.  Interestingly,  purpose- 
built  editors  for  mobile  devices  have  not  managed  to  diffuse  into  the  OSM 
community.  For  example,  Vespucci  has  a  small  percentage,  i.e.  around  1%.  The 
most  popular  editor  between  2009  and  2012  was  JOSM,  followed  by  the  online 
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Fig.  10:  Percentage  of  changesets  per  OSM  editor. 


editors  on  the  OSM  website:  initially  Potlatch  1,  and  then  Potlatch  2  and  iD. 
However,  from  2014,  iD  has  become  the  most  frequently  used  editor  when 
counting  changesets.  Yet  when  measuring  the  number  of  edits,  JOSM  has  been 
the  most  popular  editor  since  2010  (Figure  11).  Nevertheless,  in  2015,  JOSM 
use  decreased  by  5.6%  while  iD  use  has  increased  by  4.1%. 

From  what  has  been  presented  so  far,  it  is  evident  that  there  is  a  strong  vola¬ 
tility  in  the  choices  of  the  OSM  community.  The  majority  of  the  changesets  and 
edits  take  place  through  a  small  number  of  editors  that  succeed  each  other  over 
time.  While  the  aim  of  this  chapter  is  not  to  compare  and  evaluate  the  func¬ 
tionality  of  each  editor,  it  is  to  be  noted  that  the  potential  differences  in  their 
functionality  or  abidance  to  the  OSM  specifications  might  cause  inconsisten¬ 
cies  and  deteriorate  the  overall  quality  of  the  data  submitted.  However,  on  the 
positive  side,  the  strength  and  devotion  of  the  OSM  community  in  creating 
new  editors  that  adapt  to  new  challenges  and  requirements  can  be  seen. 


5.2  The  Functionality  of  the  Editors 

Apart  from  the  number  of  OSM  editors  available,  what  has  also  changed  is  their 
functionality.  The  existence  of  a  set  of  rules  that  function  as  a  product  specifica¬ 
tion  also  needs  to  be  supported  by  the  available  tools  for  the  task.  Thus,  the  level 
and  efficiency  of  the  editors  at  any  given  point  in  time  plays  a  crucial  role  in  the 
quality  of  the  contributions.  Here  we  present  the  evolution  of  the  functionality 
across  the  active  editors  from  2006  to  the  present: 


Percentage  of  edits 
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Fig.  11:  Percentage  of  edits  per  OSM  editor. 


In  2006,  the  OSM  editors  serve  only  to  upload  GPS  tracks.  Only  the  online 
editing  applet  provides  a  Landsat  photo,  and  thus  GPS  tracks  cannot  be  verified 
in  comparison  with  a  satellite  image. 

•  In  2007,  Landsat  overlay  becomes  available  in  JOSM  1.0  and  some  editing 
facilities  are  offered.  Merkaartor,  a  small  editor  for  OSM  with  some  unique 
features  like  anti-aliased  displaying  and  transparent  display  of  map  features, 
also  appears. 

•  In  2007,  the  online  editor  applet  displays  Yahoo!  Aerial  Imagery  under  the 
GPS  trackpoints  while  editing.  This  is  very  useful,  and  in  fact  more  accurate 
than  GPS  data  in  the  areas  where  coverage  is  most  detailed  (cities) .  In  other 
areas  it  may  sometimes  assist  in  correcting  GPS  tracks. 

•  In  2008,  photomapping  is  added  in  JOSM,  which  allows  users  to  retrieve 
photographs  and  work  with  them  on  screen,  positioned  alongside  the  map 
data  in  the  editor.  In  addition,  if  GPS  location  information  is  included  in  the 
photograph  files  or  a  GPS  track  is  available,  JOSM’s  photograph  mapping 
features  can  be  used  to  see  them  in  context,  and  perhaps  position  new  ele¬ 
ments  based  on  the  recorded  photograph  positions. 

•  In  2008,  Merkaartor  can  use  satellite  imagery  from  Yahoo!  or  any  other 
Web  Map  Service  (WMS). 
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•  In  2009,  JOSM  acquires  fast  fluid  panning  and  zooming,  which  provides 
for  precise  mapping.  It  is  now  possible  to  work  offline  using  downloaded 
data  files,  local  photo  and  GPX  files.  Offline  editing  can  help  volunteers 
work  more  carefully  in  a  less  rushed  manner,  and  thus  could  provide  better 
contributions.  In  addition,  advanced  editing  functionality,  which  improves 
positional  accuracy,  becomes  available. 

•  In  2010,  Yahoo!  Aerial  Imagery,  Bing  and  other  aerial  imagery  become 
available  in  JOSM  as  backgrounds  for  tracing.  JOSM  also  supports  audio 
mapping.  Potlatch  2,  a  new  version  of  the  Potlatch  editor,  appears,  offering 
quite  a  different  editing  experience.  In  addition  to  this,  OSM  cooperation 
with  QGIS  and  Esri’s  ArcGIS  leads  to  add-ons  with  very  comprehensive 
GIS  capabilities  and  advanced  editing,  further  improving  quality. 

•  In  2015,  JOSM  provides  a  large  selection  of  aerial  imagery  and  third-party 
GPS  traces  as  backgrounds  for  tracing,  as  well  as  a  built-in  validator,  which 
checks  for  common  mapping  errors  before  the  data  are  uploaded.  Tags  are 
shown  to  users  directly  with  links  to  the  OSM  wiki  page,  which  returns 
information  for  a  tag.  In  iD,  custom  aerial  imagery  can  be  used,  photo¬ 
graphs  are  directly  available  in  the  editor  from  Mapillary7,  and  OSM  editors 
have  access  to  billions  of  GPS  tracks  recorded  by  Strava8  users,  which  allows 
for  very  precise  mapping  of  twisted  roads  and  trails.  Potlatch  2  develops 
advanced  features,  including  vector  backgrounds,  a  merging/conflation 
functionality  for  specialists  and  several  aerial  imagery  backgrounds,  which 
are  preconfigured,  as  well  as  the  introduction  of  an  option  for  custom  Tile 
Map  Service  (TMS)  imagery. 

•  At  the  time  of  writing  (May  2016),  JOSM  seems  to  be  the  most  promis¬ 
ing  editor  in  terms  of  quality  assurance  based  on  the  tools  offered,  such  as 
advanced  geometry  and  topology  editing;  the  resolving  of  conflicts;  the  tag¬ 
ging  of  presets;  a  validator  that  checks  for  common  mapping  errors  before 
data  upload;  selection  of  background  images  and  custom  TMS,  WMS  and 
Web  Map  Tile  Service  (WMTS);  selection  of  third-party  GPS  traces  imme¬ 
diately  available  as  backgrounds  for  tracing;  etc. 

6  Discussion  and  Conclusions 

It  is  not  common  for  a  discussion  section  to  begin  with  what  the  study  has  not 
done.  Yet,  in  this  case,  it  is  necessary.  We  only  scratched  the  surface  of  what 
could  be  done.  We  sampled  only  a  few  of  the  847  versions  of  just  one  wiki 
page,  albeit  an  important  one,  and  we  used  these  to  examine  selected  cases 
of  the  changes  recorded.  The  entire  OSM  specification  consists  of  hundreds 
more  wiki  pages  with  information  about  each  feature  and  the  possible  key/ 
value  combinations.  Each  of  these  extra  pages  have  their  history,  which  might, 
in  turn,  consist  of  hundreds  of  versions.  The  workload  required  to  monitor 
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each  and  every  change  would  be  immense.  The  other  thing  that  we  did  not  do  is 
examine  the  OSM  editor’s  evolution  from  a  data  quality  viewpoint.  This  would 
require  comparing  the  evolving  functionality  of  all  available  editors  against  the 
active  OSM  specification  at  each  point  in  time  across  a  timeline;  again,  this  is  a 
task  that  would  be  next  to  impossible. 

The  value  of  this  chapter  is  in  its  context  and  orientation.  Regarding  the  for¬ 
mer,  the  methodology  chosen  did  not  try  to  provide  quantitative  descriptions 
of  different  quality  elements  or  indicators  but  rather  to  provide  context  and  to 
expand  the  discussion  on  OSM  quality  by  delving  into  the  micro-environment 
of  OSM.  Indeed,  we  treated  the  ‘Map  Features’  wiki  page,  the  main  OSM  speci¬ 
fication  page  and  the  OSM  editors  as  living  organisms  and  chose  to  examine 
how  they  have  grown  and  evolved  over  time.  By  not  studying  and  thus  not 
fully  understanding  the  environment  within  which  OSM  data  are  created, 
studies  on  the  subject  of  data  quality  do  not  have  a  solid  context,  i.e.  they  deal 
with  the  symptoms  and  ignore  the  cause.  This,  in  turn,  leads  us  to  orientation. 
VGI  quality  has  become  a  popular  subject  of  study  among  researchers.  Much 
of  the  literature  has  focused  on  the  nature  of  the  phenomenon  (Antoniou, 
2011),  on  the  contributors  (Ciepluch  et  al.,  2011;  Nedovic-Budic  and  Bud- 
hathoki,  2010)  and  on  the  social  engineering  behind  it  (Haklay,  2010;  Hak- 
lay  et  al.,  2010;  Zielstra  and  Zipf,  2010).  Other,  more  technical  papers  have 
delved  into  statistics  and  measures  of  various  quality  elements  and  indicators 
(Barron  et  al.,  2014;  Kelli er  and  de  Groot,  2013),  usually  by  comparing  OSM 
data  with  authoritative  products.  In  this  chapter,  the  idea  was  to  re-orient  the 
discussion  towards  the  fundamentals  of  spatial  products.  The  specifications 
of  a  product  and  the  tools  available  to  produce  it  largely  define  the  outcome, 
regardless  of  the  effort,  the  workload  or  the  enthusiasm  put  into  producing 
it.  OSM  is  clearly  much  more  than  a  spatial  product,  and  the  value  of  VGI,  in 
general,  is  orders  of  magnitude  greater  than  the  achieved  quality  (Antoniou, 
2016).  However,  if  the  goal  is  to  improve  the  quality  of  VGI,  then  we  need 
to  have  a  better  understanding  of  the  micro-environment  within  which  each 
VGI  project  grows. 


Notes 


1  https://www.mediawiki.org 

2  https://meta.wikimedia.Org/wiki/Help:Minor_edit 

3  http://wiki.openstreetmap.org/wiki/Main_Page 

4  http://wiki.openstreetmap.org/wiki/Deprecated_features 

5  http://wiki.openstreetmap.org/wiki/Editors 

6  http://wiki.openstreetmap.org/wiki/Editor_usage_stats 

7  https://www.mapillary.com 

8  https://www.strava.com 
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Abstract 

The  flourishing  of  VGI  projects  has  transformed  the  average  web  user  into  an 
eager  geographic  data  user  and  contributor.  As  it  is  difficult  for  the  crowd  to 
perceive  VGI  quality,  visualisation  can  play  a  critical  role  in  communicating 
data  quality.  At  the  same  time,  although  VGI  quality  has  been  a  prominent 
research  topic  for  scientists,  quality  visualisation  has  not  been  exploited  to  its 
full  potential.  Since  the  crowd  encompasses  a  diverse  pool  of  users,  VGI  quality 
visualisation  caters  for  different  needs  and  exhibits  variable  functionality,  oper¬ 
ating  as  an  awareness  tool  for  the  novice  user  as  well  as  an  exploration  tool  for 
the  expert  user  /  scientist.  The  scope  of  this  chapter  is  to  present  a  framework 
for  VGI  quality  visualisation  that  takes  into  account  factors  such  as  methods 
for  quality  visualisation  of  spatial  data,  the  nature  of  VGI  data  quality,  user 
profiles  and  the  visualisation  environment.  In  addition,  a  review  of  the  available 
methods  for  data  quality  visualisation,  which  have  emerged  from  cartography, 
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is  presented,  and  a  number  of  guidelines  for  VGI  quality  visualisation  are  pro¬ 
posed,  taking  into  account  user  characteristics. 


Keywords 

quality,  VGI,  visualisation,  VGI  quality  awareness,  VGI  quality  exploration, 
visualisation  framework 


1  Introduction 

Quality  visualisation  of  geospatial  data  is  as  important  as  the  data  themselves 
(Pang,  2001).  The  recent  development  in  VGI  projects,  such  as  OpenStreetMap 
(OSM)  and  Geonames,  makes  this  topic  even  more  critical  and  challenging,  as 
novice  users  now  access,  use  and  create  geographic  information.  The  novice 
user  does  not  question  the  quality  of  VGI  data,  as  he/she  is  either  unaware  of 
the  quality  issue  or  erroneously  believes  that  quality  problems  do  not  exist  in 
the  dataset.  The  source  of  geographic  data  (i.e.  VGI  vs.  proprietary/authorita¬ 
tive)  is  not  perceived  as  an  important  factor  when  determining  the  credibility 
of  a  map  (Parker,  2014).  A  nicely  designed  map  in  terms  of  cartography  and 
an  operational  map  environment,  e.g.  OSM,  is  considered  as  a  reliable  source. 
Judgement  is  based  on  peripheral  signals  such  as  visual  design  and  symbology 
(e.g.  ‘if  it  looks  good  and  attractive,  then  it  is  good’;  Idris  et  al.,  2011).  Quality 
reporting  in  text  and  tables  may  be  easily  understood  by  experts  but  not  by  the 
diverse  pool  of  VGI  users.  Since  visualisation  can  communicate  data  quality  to 
all  users  (Buttenfield,  1983;  Drecki,  2002;  MacEachren  et  ah,  2005),  it  is  pro¬ 
posed  to  use  visualisation  to  reveal  VGI  data  quality. 

VGI  quality  has  been  given  particular  attention  by  scientists.  Much  of  the 
work  concentrates  on  assessing  and  reporting  VGI  quality  in  diverse  outlets, 
but  only  a  few  studies  include  visualisations.  According  to  the  OSM  wiki,  there 
are  a  number  of  online  web  pages  characterised  as  ‘Visualisation  tools’1  related 
to  ‘Quality  assurance’.  However,  these  mainly  refer  to  error  and  bug  reporting 
tools  with  maps  and  do  not  constitute  an  actual  quality  visualisation  environ¬ 
ment.  Visualisation  has  not  been  exploited  to  its  full  potential  and  scientists 
have  not  taken  full  advantage  of  its  capabilities.  As  a  result,  researchers  miss 
aspects  of  VGI  quality  that  visualisation  could  reveal.  One  may  assume  that  in 
the  early  days  of  VGI,  VGI  quality  measures  and  indicators  were  not  mature 
enough  to  be  visually  represented:  past  research  has  suggested  that  without 
a  good  understanding  of  quality,  effective  approaches  to  visualisation  remain 
elusive  (MacEachren  et  ah,  2005).  However,  a  review  of  the  literature  indicates 
the  existence  of  a  plethora  of  measures  and  indicators  that  now  manage  to  suc¬ 
cessfully  express  VGI  quality  (see  e.g.  Antoniou  and  Skopeliti,  2015;  Senaratne 
et  ah,  2016). 
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1.1  The  Role  of  VGI  Quality  Visualisation 


Visualisation  can  be  used  to  communicate  VGI  quality  to  the  crowd  (Figure  1). 
Visualisation  transforms  VGI  quality  from  an  issue  that  is  rather  ignored  and 
difficult  to  perceive  into  a  perceptible  and  vivid  data  characteristic.  As  the 
crowd  consists  of  a  diverse  pool  of  users  in  terms  of  knowledge  and  experience 
with  spatial  data,  VGI  quality  visualisation  needs  to  satisfy  different  require¬ 
ments.  Visualisation  is  applicable  to  two  distinct  but  related  activities:  visual 
thinking,  which  is  exploratory  and  engages  scientists;  and  visual  communica¬ 
tion,  which  is  explanatory  and  refers  to  the  distribution  of  existing  knowledge 
(DiBiase  et  al.,  1992).  Thus  VGI  quality  visualisation  can  have  multiple  func¬ 
tionalities:  it  can  be  considered  as  an  awareness  tool  for  the  novice  user  as  well 
as  an  exploration  tool  for  the  expert  user  /  scientist.  Users  with  intermediate 
knowledge  and  experience  can  take  advantage  of  the  different  functionalities 
depending  on  their  abilities.  In  more  detail,  VGI  data  quality  visualisation  can 
be  considered: 

•  An  awareness  tool  for  the  novice  user  that  can  be  used  to  draw  the  attention 
of  the  crowd  to  VGI  quality;  force  the  crowd  to  question  VGI  quality;  com¬ 
municate  quality  in  a  way  that  can  be  understood  by  the  layperson;  stimu¬ 
late  contribution  improvements;  etc.  Many  research  projects  (MacEachren 
et  al.,  1995;  Leitner  and  Buttenfield,  2000;  Cliburn  et  al.,  2002;  Deitrick, 
2007)  have  demonstrated  that  quality  visualisation  supports  the  process  of 


quality  visualisation 


Fig.  1  :  VGI  data  quality,  visualisation,  users  and  functionality. 
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decision-making  and  leads  to  significantly  better  decisions.  Consequently, 
it  is  important  to  inform  users  about  data  quality  in  order  to  select  VGI  data 
that  are  appropriate  for  a  specific  purpose.  Although  experts  do  not  find 
uncertainty  visualisation  overwhelming,  confusing  or  useless  (Kunz,  2011), 
with  so  many  non-expert  VGI  users,  there  is  a  need  to  make  sure  that  visu¬ 
alisation  is  understandable  by  all  users,  not  only  expert  ones  (Jones,  2011). 
This  can  be  achieved  by  exploring  the  full  potential  of  data  quality  visualisa¬ 
tion  and  selecting  the  appropriate  methods. 

•  And  an  exploration  tool  for  the  expert  user  /  scientist  that  can  aid  researchers 
to  study  the  appropriateness  and  the  ability  of  measures  and  indicators  to 
express  quality;  to  discover  dependencies  to  extrinsic  socio-economic  or 
demographic  factors;  to  explore  the  spatial  distribution  and  heterogeneity 
of  VGI  quality;  etc. 


1.2  A  Framework  for  VGI  Quality  Visualisation 

In  the  previous  paragraph,  the  role  of  VGI  quality  visualisation  as  an  awareness 
and  as  an  exploration  tool  has  been  discussed.  However,  although  VGI  quality 
visualisation  is  acknowledged  as  necessary,  it  is  also  considered  as  a  big  chal¬ 
lenge  (Sester  et  al.,  2014).  As  a  result,  a  framework  for  VGI  quality  visualisation 
that  can  facilitate  and  guide  the  successful  design  of  VGI  quality  visualisation 
is  much  welcomed;  this  framework  acknowledges  four  interactive  parameters 
that  influence  VGI  quality  visualisation  (Figure  2): 

i)  VGI  Data  Quality:  The  framework  takes  into  account  the  nature  of  VGI 
datasets,  the  applicable  data  quality  elements  and  the  measures  and  indi¬ 
cators  used  to  measure  quality  -  see  Chapter  7  by  Fonte  et  al.  (2017)  and 
Chapter  13  by  Olteanu-Raimond  et  al.  (2017a). 

ii)  Quality  Visualisation  Methods:  Well  established  methods  for  spatial 
data  quality  visualisation  that  emerge  from  the  domain  of  cartography 
can  be  integrated  in  the  framework.  Accumulated  cartographic  knowl¬ 
edge  can  provide  a  number  of  best  practices  for  a  successful  visual  com¬ 
munication  and  exploration  of  quality  (see  Section  4). 

iii)  Users:  The  framework  caters  for  end  users  of  all  backgrounds.  The  mem¬ 
bers  of  the  diverse  pool  of  VGI  users,  who  range  from  novice  users  to 
scientists,  are  the  final  recipients  of  data  quality,  and  their  needs  should 
be  covered  through  effective  visualisation  processes. 

iv)  Medium/Visualisation  Environment:  The  framework  exploits  the 
opportunities  of  the  medium  used  to  deliver  the  map  (i.e.  computer  or 
mobile  devices)  and  the  availability  of  a  number  of  smart  tools  such  as  a 
graphical  user  interface  (GUI),  interactive  controls,  etc.  that  create  a  rich 
and  effective  visualisation  environment. 
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Fig.  2:  A  framework  for  VGI  quality  visualisation. 


The  above  factors  of  the  VGI  quality  visualisation  framework  are  discussed  in 
detail  in  Section  3. 

In  this  context,  the  paper  is  structured  as  follows:  Section  2  provides  an  over¬ 
view  of  the  present  status  of  VGI  quality  visualisation,  Section  3  describes  in 
detail  the  elements  of  the  framework  for  VGI  quality  visualisation  and  Section 
4  presents  the  state  of  the  art  in  data  quality  visualisation  methods,  providing 
specific  guidelines  for  VGI  data  quality  visualisation.  The  chapter  ends  with 
conclusions  and  proposals  for  future  work. 


2  Present  Status  of  VGI  Quality  Visualisation 

2.1  Measures  and  Indicators  for  VGI  Quality 

Scientists  assess  VGI  quality  with  measures  and  indicators  (see  Chapter  7  by 
Fonte  et  al.,  2017).  A  number  of  studies  have  tried  to  estimate  VGI  quality  by 
comparing  VGI  with  proprietary  data  (e.g.  Girres  and  Touya,  2010;  Haklay, 
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2010;  Zielstra  and  Zipf,  2010),  utilising  measures  that  emerge  from  quality 
assessment,  data  matching,  generalisation  evaluation,  etc.  Because  measures  are 
not  sufficient  for  characterising  VGI  quality,  academic  research  focuses  on  data 
quality  indicators.  Indicators  can  be  categorised  into  (Antoniou  and  Skopeliti, 
2015):  i)  data  indicators  (see  e.g.  Barron  et  al.,  2014;  Ciepluch  et  at,  2010a; 
Kefiler  and  de  Groot,  2013;  van  Exel  et  al,  2010);  ii)  demographic  indicators 
(see  e.g.  Haklay,  2010;  Haklay  et  al.,  2010;  Mullen  et  al.,  2015;  Tulloch,  2008; 
Zielstra  and  Zipf,  2010);  iii)  socio-economic  indicators  (see  e.g.  Antoniou, 
2011;  Elwood  et  al,  2013;  Girres  and  Touya,  2010;  Haklay  et  al.,  2010);  and 
iv)  contributor  indicators  (see  e.g.  DAntonio  et  al.,  2014;  Nedovic-Budic  and 
Budhathoki,  2010).  Since  VGI  quality  is  currently  assessed  with  a  plethora  of 
measures  and  indicators,  the  need  for  visual  representation  makes  VGI  quality 
visualisation  highly  topical. 


2.2  VGI  Quality  Visualisation 

Once  meta-information  about  VGI  quality  is  available,  there  are  different  ways 
to  portray  it  graphically.  Only  a  few  of  the  VGI  quality  studies  have  provided  a 
visualisation  of  the  quality;  the  next  paragraphs  present  a  detailed  review  of  the 
visualisation  methods  applied  in  these  studies. 


2.2.1  Measures 

A  number  of  studies  access  VGI  quality  with  measures  based  on  the  com¬ 
parison  of  VGI  and  proprietary  data  and  provide  quality  visualisation  (e.g. 
Antoniou,  2011;  Fan  et  al.,  2014;  Forghani  and  Delavar,  2014;  Haklay,  2010). 
Values  of  quality  measures  (e.g.  distance  between  features,  length  difference 
of  the  road  network,  the  area  and  density  difference  of  buildings,  etc.)  are  cal¬ 
culated  for  a  grid  that  covers  the  study  area,  and  are  portrayed  utilising  colour 
schemes  based  on  hue  and  value. 


2.2.2  Contributor  Indicators 

Other  studies  assess  the  ‘perceived  quality’  instead  of  the  ‘measured  quality’, 
i.e.  user  perception  about  the  data  quality,  which  is  based  on  personal  opinion 
and  commentary  and  feedback  from  other  users,  is  portrayed.  Inspired  by  the 
popular  web  rating  system  that  is  utilised  in  sites  such  as  Amazon,  eBay,  iTunes, 
etc.  and  that  assesses  quality  on  a  1  to  5  rating  system,  the  quality  visualisation 
proposed  by  Jones  (2011)  results  in  a  Virtual  Globe  with  glyphs  (e.g.  star  2D, 
star  3D),  where  visual  variables  such  as  size  and  colour  portray  the  magnitude 
of  quality.  Schiewe  (2013)  records  the  opinion  of  the  user  for  the  current  region 
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of  interest  in  OSM  with  a  ‘like’  or  ‘dislike’  button  and  visualises  it  with  picto- 
grams  such  as  smiling  faces,  targets,  etc. 


2.2.3  Data  Indicators 

In  recent  studies,  a  number  of  data  indicators  have  been  proposed  and  visual¬ 
ised.  Two  different  approaches  are  observed:  indicators  can  be  computed  and 
visualised  at  the  feature  level  or  using  grid  cells  that  cover  the  study  area.  In  the 
first  approach,  nodes,  points  and  lines  are  used.  For  example,  Trame  and  Kefiler 
(2011)  visualised  the  number  of  versions  for  OSM  POIs  (Points  of  Interest) 
by  using  a  colour  spectral  scheme  (heat  map2)  and  overlaid  the  representation 
onto  OSM.  In  another  study  (van  Exel,  2011a),  contour  lines  were  used  to  visu¬ 
alise  the  average  number  of  version  (updates)  of  any  node  in  the  OSM  data¬ 
base.  Contours  of  different  values  were  visualised  with  different  hues.  Van  Exel 
(2011b)  also  proposed  a  combined  visualisation  of  two  metrics  for  the  linear 
OSM  features:  (i)  the  time  passed  since  a  feature  has  last  been  updated  by  the 
community  is  visualised  using  a  hue  colour  scheme  and  (ii)  the  number  of  ver¬ 
sions,  indicating  how  many  updates  a  feature  has  received  since  its  creation,  is 
visualised  using  the  width  of  the  linear  symbol.  In  another  study  (Kefiler  and  de 
Groot,  2013),  the  trustworthiness  of  selected  features  was  assessed  by  the  num¬ 
bers  of  versions,  users,  confirmations,  corrections  and  rollbacks  and  was  then 
visualised  with  different  hue  colour  schemes.  Two  cases  of  interactive  visualisa¬ 
tion  have  also  been  recorded.  Antoniou  (2011)  used  an  interactive  map,  which 
could  alternate  between  data  and  quality  visualisation,  to  visualise  conceptual 
compliance  to  the  OSM  wiki-based  specifications  for  each  feature,  using  a  hue 
colour  scheme.  In  iOSMAnalyzer  (Barron  et  al,  2013),  25  intrinsic  measures 
referring  to  ‘General  Area  Information’,  ‘Routing  &  Navigation,  ‘Address- 
Search’,  ‘Points  of  Interest-Search’,  ‘Map -Applications’  and  ‘User-Information  & 
-Behavior’  were  calculated  and  portrayed  in  maps  using  hue  colour  schemes. 

Other  studies  in  the  literature  take  the  second  approach,  which  is  the  grid- 
based  approach.  The  densities  of  points  and  other  indicators  (Ciepluch  et  al, 
2010b)  for  OSM  data  have  been  computed  for  a  grid  and  visualised  utilising 
a  colour  spectrum  scheme.  In  Roick  et  al.  (2012),  OSM  data  for  Europe  were 
divided  into  hexagonal  cells  and  a  number  of  spatio-temporal  quality  metrics 
(user  activity,  topicality  and  number  of  features)  were  calculated  and  visual¬ 
ised  with  hue  and  value  colour  schemes  in  a  web  application.  The  conceptual 
compliance  (Ballatore  and  Zipf,  2015)  of  tags  was  calculated  on  a  10  km2  grid 
and  portrayed  using  a  value  colour  scheme.  In  another  study  (Camboim  et  al., 
2015),  completeness  (number  of  buildings/km2,  road  density,  road  length,  per¬ 
centage  of  unclassified  roads)  and  temporal  quality  (number  of  editors  and 
days  since  last  edition)  were  computed  for  administrative  regions  and  visual¬ 
ised  utilising  a  number  of  hue  and  value  colour  schemes. 
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2.3  Evaluation  of  Existing  VGI  Quality  Visualisations 

From  the  above  analysis,  it  becomes  evident  that  VGI  quality  assessment  has 
been  conducted  per  feature  or  per  area  (grid  cell  or  administrative  area)  and 
that  this  pattern  is  followed  for  VGI  quality  visualisation  as  well.  The  visualisa¬ 
tion  of  VGI  quality,  as  it  appears  in  the  studies  mentioned  above,  can  be  char¬ 
acterised  as  cartographically  poor.  Although  a  number  of  methods  for  quality 
visualisation  exist  in  the  cartographic  literature  (see  Section  4),  only  a  few  of 
them  have  been  applied.  Most  cases  use  only  colour  schemes  based  on  hue 
and  value.  Additionally,  quality  visualisation  is  notably  presented  separately, 
independently  from  the  data,  offline  and  asynchronously.  Thus,  it  does  not 
permit  quality  judgement  while  looking  at  the  data,  and  it  obscures  data  visu¬ 
alisation,  as  attribute  information  is  lost.  With  poor  symbolisation  or  design 
choices,  quality  visualisation  leads  to  more,  rather  than  less,  uncertainty  about 
the  data  depicted  (MacEachren  et  al.,  2005).  Practices  for  VGI  quality  visuali¬ 
sation  need  to  be  revised  and  updated  based  on  a  framework  for  VGI  quality 
visualisation. 


3  A  Framework  for  VGI  Quality  Visualisation 

The  scope  of  this  section  is  to  discuss  in  detail  the  components  of  the  frame¬ 
work  for  VGI  quality  visualisation  presented  in  Section  1.  Each  component  is 
analysed  in  order  to  present  its  contribution  to  quality  visualisation.  Finally,  a 
number  of  guidelines  are  proposed  that  can  help  the  design  of  a  VGI  quality 
visualisation  environment. 


3.1  VGI  Data  Quality 

The  nature  of  VGI  datasets  -  see  Chapter  2  by  See  et  al.  (2017)  and  Chapter  3 
by  Mooney  and  Minghini  (2017)  -  and  their  quality  aspects  play  an  impor¬ 
tant  role  in  the  choices  regarding  visualisation.  Past  research  (Buttenfield  and 
Beard,  1994;  Buttenfield  and  Weibel,  1988;  MacEachren,  1992;  MacEachren, 
1995)  has  proved  that  the  selection  of  a  visualisation  method  should  be  related 
to  the  quality  element  represented  and  the  measure/indicator  used.  The 
main  information  that  users  need  about  VGI  quality  focuses  on  fitness-for- 
use.  Since  fitness-for-use  depends  on  a  number  of  quality  elements  (such  as 
positional  accuracy,  completeness,  currency,  etc.)  and  on  criteria  related  to 
the  planned  use  of  the  data,  users  may  need  to  be  presented  with  visualisa¬ 
tions  for  a  number  of  data  quality  measures  and  indicators  in  order  to  reach 
a  decision  on  the  suitability  of  a  dataset.  As  a  result,  in  order  for  users  to  fully 
benefit  from  the  provision  of  various  measures  and  indicators,  a  wide  variety 
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of  visualisation  methods  should  be  provided,  enhanced  with  interactivity  to 
maximise  functionality. 

The  nature  of  the  quality  indicator  or  measure  affects  the  functionality  of  the 
visualisation  as  an  awareness  tool  or  as  an  exploratory  tool.  For  instance,  qual¬ 
ity  measures  that  are  computed  through  comparison  with  authoritative  data, 
although  descriptive,  cannot  be  used  to  support  the  quality  awareness  role:  they 
are  computed  offline,  post-processing  is  needed  and  they  depend  on  the  exist¬ 
ence  of  reference  data,  which  is  not  always  the  case.  On  the  contrary,  they  are 
considered  valuable  for  VGI  quality  exploration  by  scientists.  Visualisation,  as  a 
VGI  quality  awareness  tool,  requires  quality  indicators  that  can  only  be  calcu¬ 
lated  in  real  time  from  the  VGI  data  or  other  available  data,  for  simultaneous 
provision  to  the  user. 

Therefore,  in  order  to  provide  for  good  understanding  of  quality  and  fitness- 
for-use  judgement,  one  should  provide  a  number  of  data  quality  measures  and 
indicators  along  with  visualisation  support.  Specific  visualisation  functionality, 
e.g.  quality  awareness  or  quality  exploration,  is  made  possible  by  selecting  the 
appropriate  quality  descriptors,  as  explained  above. 


3.2  Quality  Visualisation  Methods 

Quality  visualisation  can  be  handled  as  the  cartographic  portrayal  of  any  other 
spatial  phenomenon.  Thus,  the  analysis  of  the  measure/indicator  and  the  val¬ 
ues  that  describe  it,  of  the  classification  according  to  geometry  (point,  line, 
area),  and  of  the  measurement  scale  (continuous  or  discrete;  ordinal  or  cat¬ 
egorical)  will  lead  to  the  selection  of  the  appropriate  visualisation  method. 
VGI  data  visualisation  and  quality  visualisation  should  work  together  as  a 
whole  (holistic/ symbiotic  approach)  and  balance  simplicity,  detail,  richness  of 
visualisation  and  ease  of  understanding.  Technical  feasibility  should  also  be 
considered.  Methods  should  not  be  too  complex,  so  that  they  can  be  applied 
easily  within  the  framework  of  a  VGI  project. 

One  of  the  most  attractive  developments  in  cartography,  which  are  based  on 
modern  technologies,  is  3D  mapping.  3D  maps  pose  new  challenges  to  cartog¬ 
raphers,  as  these  representations  must  be  very  well  adapted  to  the  context  of 
the  user  and  must  provide  understandable  and  easy-to-perceive  information 
and  messages.  Some  VGI  data  can  be  mapped  in  3D.  The  Third  dimension  is  a 
growing  topic  in  OSM  (OpenStreetMap  Wiki,  2017),  for  example,  a  number  of 
web  pages  provide  maps  with  3D  rendering  of  buildings.  Data  quality  visualisa¬ 
tion  methods  are  considered  to  be  adaptable  to  the  3D  context,  yet  the  subject 
hides  big  challenges  (Bandrova  et  al.,  2012;  Jones,  2011;  Pang  et  al.,  1997). 

A  detailed  review  of  available  quality  visualisation  techniques  emerging  from 
cartography,  as  well  as  guidelines  to  select  the  appropriate  methods  taking  into 
account  usability  and  user  experience,  is  presented  in  Section  4. 


206  Mapping  and  the  Citizen  Sensor 


3.3  Users 

An  important  factor  for  successful  map  design  is  to  know  who  the  audience  is. 
Regarding  VGI,  there  will  always  be  a  group  of  unknown  users  despite  the  effort 
of  producers  to  register  volunteers  and  involve  them  in  user  groups  (Vullings 
et  al.,  2015).  Since  cartographic  representations  can  only  be  optimised  if  end 
users  and  data  types  are  known  (Kunz  et  al.,  2011),  it  is  impossible  to  provide 
successful  VGI  quality  visualisations  for  all  users.  Users  with  no  knowledge  of 
visualisation  quality  will  work  with  a  map  differently  than  a  professional  who 
has  been  dealing  with  the  issue  for  some  time  (Brus  and  Pechanec,  2015).  For¬ 
tunately,  the  dual  role  of  visualisation  as  a  communication  and  as  an  explora¬ 
tion  tool  (DiBiase  et  al.,  1992)  can  serve  all  VGI  user  needs.  The  idea  of  levels 
of  uncertainty  visualisation  in  relation  to  the  experience  and  needs  of  the  user 
is  discussed  in  Beard  and  Mackaness  (1993).  Three  levels  are  distinguished: 
the  first  level  is  simply  a  notification  of  poor  data  quality,  with  ‘poor’  defined 
on  the  basis  of  a  predetermined  threshold;  the  second  level  adds  detail,  such  as 
the  location  and  type  of  quality  conflict,  etc.;  and  the  third  level  focuses  on  giv¬ 
ing  users  methods  for  investigating  the  reasons  for  uncertainty.  A  VGI  quality 
visualisation  environment  should  provide  for  all  users  and  take  into  account 
different  user  needs  and  characteristics.  Based  on  this  context,  VGI  quality  vis¬ 
ualisation  design  should  address  the  profiles  of  at  least  two  user  groups,  which 
are  opposites  in  terms  of  experience  and  knowledge:  the  novice  user  profile  and 
the  expert  user  /  scientist  profile. 


3.4  Medium/Visualisation  Environment 

Among  the  quality  visualisation  methods  addressed  in  the  literature,  a  fre¬ 
quently  repeated  idea  is  that  users  need  control  over  depictions  of  quality 
(MacEachren  et  al.,  2005).  Cliburn  et  al.  (2002)  proposed  to  help  users  cope 
with  the  complexity  of  the  display  by  providing  interactivity.  Interactive 
functionality  can  facilitate  the  interpretation  of  visualisation  and  cater  for 
the  different  needs  of  heterogeneous  user  groups.  A  number  of  choices  can 
be  available  in  interactive  functionality:  selection  among  different  carto¬ 
graphic  methods  for  the  visualisation  (see  Section  4);  or  customisation  of 
the  selected  visualisation  method  according  to  user  needs,  e.g.  configuration 
of  visual  variables  such  as  colour  schemes  based  on  hue  and  value,  symbol 
sizes,  and  data  quality  value  classification,  among  others.  Once  the  visualisa¬ 
tion  meets  the  requirements  of  the  user  (Kunz  et  al.,  201 1),  the  cartographic 
representation  can  be  analysed  visually,  or,  in  addition,  explored  with  the 
help  of  further  functionality  (e.g.  a  tooltip  window  displaying  detailed  infor¬ 
mation).  Of  course  only  expert  users  can  make  good  use  of  strong  inter¬ 
activity,  whereas  novice  users  may  be  restricted  to  graphic  modification  of 
visualisations. 
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Graphical  user  interfaces  (GUIs)  are  a  powerful  tool  in  visualisation  support 
as  they  enhance  functionality,  through  e.g.  the  graphic  modification  of  visuali¬ 
sations,  screen  division  and  simultaneous  display  of  data  and  quality  visualisa¬ 
tion  in  neighbouring  windows,  interactive  tools  such  as  a  ‘quality  slider’  that 
controls  the  appearance  of  the  data  in  relation  to  quality,  buttons  that  control 
whether  different  components  -  data  or  quality  -  should  be  visually  dominant, 
etc.  Functionality  classification,  based  on  Cron  et  al.  (2007),  includes:  general 
functions,  functions  for  navigation,  didactic  functions,  cartographic  and  visu¬ 
alisation  functions  and  GIS  functions.  Cartographic  and  visualisation  func¬ 
tionality  (Cron  et  al.,  2007)  refers  to  map  manipulation,  redlining  (addition  of 
drawings,  labelling,  and  comments)  and  exploratory  data  analysis. 

Apart  from  the  need  for  the  ability  of  a  visualisation  method  to  be  under¬ 
standable  by  any  user,  another  important  factor  is  the  technical  feasibility  of  the 
visualisation  methods  implementation  (Jones,  2011).  Technological  advances 
can  now  provide  geospatial  applications  with  interactivity,  flexibility  and  user 
friendliness  so  as  to  create  the  perfect  environment  for  VGI  quality  visualisa¬ 
tion.  The  integration  of  these  qualities  in  the  GUIs  of  a  VGI  project  (irrespec¬ 
tive  of  the  device  used)  will  further  enhance  the  effort  to  communicate  quality. 

As  a  result,  the  design  of  the  visualisation  environment  should  strike  a  bal¬ 
ance  between  interactivity,  cartographic  and  visualisation  functionality,  and 
technical  feasibility,  taking  into  account  the  expected  functionality,  e.g.  quality 
awareness  or  quality  exploration,  and  the  user  profile,  e.g.  novice  user  or  expert 
user/scientist. 


3.5  Guidelines  for  VGI  Quality  Visualisation  Implementation 

From  the  above  analysis  of  the  framework,  a  number  of  guidelines  may  arise 
that  can  help  the  design  of  VGI  quality  visualisation: 

•  Various  data  quality  measures  and  indicators  should  be  provided  to  the  user 
in  order  to  achieve  successful  communication  of  quality  and  permit  a  suc¬ 
cessful  fitness-for-use  judgement; 

•  The  nature  of  the  VGI  pool  of  users  should  be  addressed  and  user  needs 
and  characteristics  taken  into  account;  in  particular,  user  profiles  on  the 
opposite  ends  of  the  experience  and  knowledge  spectrum  (the  novice  user 
and  the  expert  user  /  scientist)  should  be  taken  into  account; 

•  Visualisation  functionality  e.g.  quality  awareness  or  quality  exploration 
should  be  provided  by  selecting  the  appropriate  quality  descriptors  or 
measures; 

•  Visualisation  techniques  and  guidelines  emerging  from  cartography  that 
take  usability  and  user  profile  into  account  should  be  applied;  and 

•  A  visualisation  environment  that  balances  interactivity  with  cartographic 
and  visualisation  functionality  and  technical  feasibility  should  be  designed. 
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4  A  Review  of  Methods  for  Quality  Visualisation 

Research  in  the  field  of  quality  visualisation  for  geospatial  data  has  been 
ongoing  for  the  last  30  years  (Aerts  et  ah,  2003;  Buttenfield  and  Beard, 
1994;  Buttenfield  and  Weibel,  1988;  Drecki,  2002;  Goodchild  et  al.,  1994; 
Leitner  and  Buttenfield,  2000;  MacEachren,  1992;  MacEachren  et  al.,  2005; 
McGranaghan,  1993;  Van  der  Wei  et  al.,  1998;  Wittenbrink  et  al.,  1996;  Zuk 
and  Carpendale,  2006).  In  this  section,  papers  about  geographic  data  uncer¬ 
tainty  and  quality  visualisation  are  reviewed  and  summarised,  in  order  to 
acquire  a  catalogue  of  methods/techniques  that  can  be  applied  to  VGI  qual¬ 
ity  visualisation.  This  review  may  act  as  an  informative  guide  for  designing  a 
VGI  quality  visualisation. 

The  main  challenge  of  any  visualisation  effort  is  to  select  the  most  appro¬ 
priate  method.  Symbolisation  is  based  on  visual  variables  introduced  by  Ber- 
tin  (1983).  These  include  location;  size;  shape;  orientation;  colour  hue;  colour 
value  (or  brightness  (Wilkinson,  2005),  or  lightness  (Slocum  et  al.,  2003));  tex¬ 
ture  (grain);  colour  saturation;  arrangement  (Morrison,  1974);  clarity  (fuzzi¬ 
ness);  resolution  (of  boundaries  and  images);  and  transparency  (MacEachren, 
1992).  MacEachren  (1995)  describes  the  syntax  for  the  above  visual  variables, 
giving  a  three-step  rating  of  good,  marginal  and  poor,  for  use  with  numerical, 
ordinal  and  categorical  data  (Roth,  2015). 

In  this  paper,  visualisation  methods  are  presented  in  tables  according  to  the 
classification  that  appears  in  the  bibliography  (Gershon,  1998;  Kinkeldey  et  al., 
2014a;  MacEachren  et  al.,  2005).  First,  intrinsic  visualisation  methods  are  pre¬ 
sented  in  Table  1.  Intrinsic  visualisation  methods  (Howard  and  MacEachren, 
1996)  alter  the  symbology  used  to  portray  data  values  to  additionally  represent 
quality,  through  manipulation  of  a  visual  variable  that  has  not  been  used  to 
portray  data  values,  e.g.  the  colour  value.  Table  1  presents  the  visual  variables 
that  can  be  used  to  portray  quality.  In  order  to  make  the  functionality  of  visual 
variables  understandable  to  non-experts,  the  notion  of  a  visualisation  meta¬ 
phor  was  introduced  by  MacEachren  (1992),  was  adopted  by  other  research¬ 
ers  (e.g.  Kardos  et  al.,  2006)  and  is  also  integrated  in  Table  1.  A  number  of 
the  visual  variables  presented  in  Table  1  can  be  used  in  combination  with  hue 
(Hengl,  2003;  Howard  and  MacEachren,  1996),  resulting  in  combinations  such 
as  hue,  saturation  and  value  or  value  and  hue,  in  order  to  form  colour  schemes, 
e.g.  sequential  colour  schemes,  diverging  colour  schemes,  and  qualitative  col¬ 
our  schemes  (Brewer,  1994;  Harrower  and  Brewer,  2003).  Such  schemes  can 
be  applied  in  bivariate  representations,  which  depict  data  and  quality  together, 
treating  quality  as  a  second  variable  (Kunz  et  al.,  2011;  MacEachren  et  al., 
2005).  All  intrinsic  approaches  have  in  common  the  fact  that  slight  changes  in 
uncertainty  can  be  difficult  to  identify,  especially  for  datasets  with  great  vari¬ 
ability  (Kunz  et  al.,  2011).  However,  this  can  be  mitigated  with  the  help  of  inter¬ 
active  functionality. 


Table  1:  Intrinsic  visualisation  methods. 
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Table  2:  Extrinsic  visualisation  methods. 


Method 

Description 

Visual  variable  to 
portray  quality 

Examples  in 

Glyphs 

graphical  objects  with 
2D  or  3D  geometry, 
such  as  circle,  sphere, 
vertical  bar,  pyramid, 
square  etc. 

size,  colour  value, 
saturation  etc. 

McKenzie  et  al.  (2016); 
Pang  (2001);  Slocum  et 
al.  (2003) 

Contours 

lines  that  represent 
same  values  (isolines) 
of  quality 

size  (thickness), 
colour  value 
(brightness), 
connectedness, 
colour  hue, 
texture  etc. 

DiBiase  et  al. 

(1992);  Howard  and 
MacEachren  (1996); 

Pang  (2008) 

Grids  / 
Tessellations 

a  grid  or  other 
tessellation  e.g. 
hexagons  overlaid  to 
the  data 

size  (grid 
size),  texture 
(grid  pattern), 
grid  outline 
(boundaries)  etc. 

Cedilnik  and  Rheingans 
(2000);  Kardos  et  al. 
(2008);  Kinkeldey  et  al. 
(2014b);  Mullins  (2014); 
Pang  (2008) 

Extrinsic  techniques  (Eloward  and  MacEachren,  1996),  which  introduce  new 
objects  to  depict  quality,  e.g.  glyphs,  grids,  etc.,  that  work  independently  of  the 
existing  symbols  for  data  values,  are  presented  in  Table  2.  These  new  objects 
portray  quality  using  appropriate  visual  variables  such  as  size,  colour  value, 
texture,  etc. 

In  terms  of  visual  organisation,  extrinsic  visualisation  methods  (Gershon, 
1998;  Howard  and  MacEachren,  1996)  can  be  coincident,  if  data  and  quality  are 
represented  in  one  map,  or  adjacent,  if  they  are  represented  in  adjacent  maps. 
(Intrinsic  visualisations  are,  by  definition,  coincident.) 

Finally,  quality  visualisation  methods  can  be  static,  like  the  ones  already  pre¬ 
sented,  or  dynamic.  Dynamic  representations  are  presented  in  Table  3.  Ani¬ 
mation  is  related  to  three  basic  design  elements,  or  ‘dynamic  variables’:  scene 
duration,  rate  of  change  between  scenes  and  scene  order  (DiBiase  et  al.,  1992). 
The  range  of  possible  dynamic  approaches  is  wide  because  elements  from 
animation  and  interaction  can  be  combined  in  numerous  ways.  Intrinsic  and 
extrinsic  visualisation  methods  are  static,  but  they  can  also  be  transformed  into 
dynamic  methods  through  animation. 


4.1  Quality  Visualisation  Methods  and  VGI  Data 

A  number  of  studies  that  present  methods  for  quality  visualisation  have  also 
studied  their  usability  (Aerts  et  al.,  2003;  Cliburn  et  al.,  2002;  Fisher,  1993; 
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Table  3:  Dynamic  visualisation  methods. 


Dynamic 

Variable 

Quality  is 
represented  by 

Metaphor 

Examples  in 

Sound 

sonic  variables 

a  low  pitch  sound  depicts 
good  quality  and  a  high  pitch 
sound,  bad  quality.  It  can  be 
cursor-driven. 

Fisher  (1994)  1994; 
Krygier  (1994); 

Lodha  et  al.  (1996) 

Animation 

scene  duration 

long  duration  of  an  object 
on  the  screen  depicts  good 
quality 

Fisher  (1993); 
MacEachren  et  al. 
(1998) 

rate  of  change 
between  scenes 

questionable  quality  is 
portrayed  with  rapid 
blinking 

Evans  (1997);  Fisher 
(1993);  Monmonier 
and  Gluck  (1994); 
Kardos  et  al.  (2006) 

spatially  variable 
blurring 

questionable  quality  is 
portrayed  with  very  blurred 
regions 

Gershon  (1992); 
MacEachren  et  al. 
(2005) 

scene  order 

multiple  representations: 
a  number  of  possible  data 
values  are  represented, 
and  the  existence  of  many 
different  values  creates 
questions  on  quality 

Bastin  et  al.  (2002); 
Ehlschlaeger  et  al. 
(1997) 

Gershon,  1992;  Kardos  et  al.,  2006;  Kinkeldey  et  al.,  2014a;  Lodha  et  al.,  1996; 
MacEachren  et  al.,  1998;  Pang,  2001;  Schweizer  and  Goodchild,  1992).  In  the 
following  paragraphs,  a  number  of  guidelines  for  VGI  quality  visualisation  in 
relation  to  user  experience  are  discussed,  once  again  taking  the  two  main  user 
profiles  into  account:  the  novice  user  and  the  expert  user/ scientist. 

Which  method  to  use  (intrinsic  vs.  extrinsic):  Slocum  et  al.  (2003)  found  that 
intrinsic  techniques  give  a  better  overview  of  uncertainty,  but  that  in-depth 
analysis  is  easier  with  extrinsic  techniques.  This  is  in  agreement  with  Kunz  et  al. 
(2011),  who  noted  that  none  of  the  intrinsic  approaches  can  successfully  por¬ 
tray  the  variability  in  quality.  As  a  result,  it  is  proposed  to  use  intrinsic  methods 
as  awareness  tools  for  novice  users  and  extrinsic  methods  as  exploratory  tools 
for  the  experts. 

Which  visual  variable  to  use  in  intrinsic  visualisations:  Regarding  the  intui¬ 
tiveness  needed  for  novice  users  (MacEachren  et  al.,  2012),  colour  value,  fog 
(transparency)  and  clarity  (fuzziness)  visual  metaphors  are  preferable.  On  the 
other  hand,  expert  users  prefer  transparency  or  saturation  (Kunz,  2011).  In 
terms  of  user  performance,  Kinkeldey  et  al.  (2014a)  conclude  that  colour  satu¬ 
ration  is  not  recommended,  while  colour  hue  and  value  as  well  as  transparency 
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provide  better  alternatives.  Also,  texture  on  colour  fill  and  resolution  lead  to 
good  results  and  thus  can  be  used  with  intrinsic  visualisations. 

Which  variable  to  use  in  extrinsic  methods :  Studies  on  extrinsic  displays  (Kin- 
keldey  et  al.,  2014a)  highlight  the  potential  of  glyph  and  grid-based  techniques 
for  quality  representation.  According  to  a  different  usability  study  (Senaratne  et 
al.,  2012),  contours  are  considered  the  best  method. 

Which  technique  (coincident  vs.  adjacent)  to  use:  Research  suggests  that 
both  coincident  and  adjacent  approaches  have  their  applications.  Accord¬ 
ing  to  Kinkeldey  et  al.  (2014a),  coincident  maps  can  be  seen  as  the  preferable 
option  because  the  integration  of  uncertainty  into  the  display  makes  it  easier 
to  retrieve  data  and  quality  simultaneously.  This  is  why  they  are  advised  for  the 
novice  users  in  order  to  ensure  that  quality  information  will  not  escape  their 
attention.  The  problem  of  advanced  complexity,  which  may  be  an  obstacle  for 
the  novice  user,  can  be  minimised  with  good  cartographic  design  and  interac¬ 
tivity  (e.g.  use  of  on/ off  buttons).  Expert  users  can  work  with  both  techniques 
and  should  be  able  to  decide  which  one  to  use. 

Static  or  dynamic:  There  is  evidence  (Kinkeldey  et  al,  2014a)  that  animated 
views  have  a  potential  to  successfully  represent  quality  when  static  solutions 
are  not  feasible,  but  there  is  little  evidence  that  they  perform  equally  or  bet¬ 
ter  than  more  traditional  static  depictions  when  these  are  available.  Regard¬ 
ing  dynamic  techniques,  animations  are  the  most  promising  ones  as  they 
can  be  used  to  attract  the  attention  of  the  user  (Gershon,  1992;  Blenkinsop 
et  al.,  2000).  Thus,  dynamic  visualisations  can  be  used  with  novice  users  in 
order  to  highlight  VGI  quality  issues  and  increase  awareness.  Expert  users 
can  again  work  with  all  of  the  methods,  and  they  should  be  able  to  decide 
which  one  to  use. 

Scale:  Finally,  one  should  consider  the  dynamic  scale  of  the  VGI  display  envi¬ 
ronment,  e.g.  the  OSM  web  page.  The  scale  plays  an  important  role  in  the  selec¬ 
tion  of  an  appropriate  visualisation  method,  as  intrinsic  methods  are  best  for 
larger  scales  and  extrinsic  methods  such  as  grid  and  contours  are  preferable  for 
a  global  quality  visualisation  at  smaller  scales. 

5  Conclusions  and  Future  Plans 

From  the  above  analysis,  it  is  clear  that  there  is  an  emerging  need  for  VGI  data 
quality  visualisation.  A  number  of  measures  and  indicators  for  VGI  quality 
( Antoni ou  and  Skopeliti,  2015)  have  been  proposed,  there  is  knowledge  on 
quality  visualisation  (MacEachren  et  al.,  2005;  Kinkeldey  et  al.,  2014a)  and  the 
technology  is  now  available.  Since  the  crowd  encompasses  a  diverse  pool  of 
users,  VGI  quality  visualisation  should  cater  for  different  needs  and  exhibit 
variable  functionality,  operating  as  an  awareness  tool  for  the  novice  user  as  well 
as  an  exploration  tool  for  expert  users  /  scientists.  A  framework  for  success¬ 
ful  VGI  quality  visualisation  was  presented,  incorporating  factors  such  as  the 
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nature  of  VGI  data  quality,  user  profiles,  methods  for  quality  visualisation  of 
spatial  data,  and  the  visualisation  environment. 

Effective  VGI  quality  visualisation  will  have  a  positive  impact  on  a  VGI  pro¬ 
jects  overall  quality:  quality  visualisation  will  help  users  decide  on  fitness-for- 
use,  the  quality  of  contributions  will  improve,  the  reputation  of  VGI  will  rise 
as  quality  is  better  communicated  through  visualisation,  quality  awareness  will 
increase,  sceptical  users  will  change  their  opinion  (since  most  of  the  time  VGI 
quality  is  better  than  expected)  and  quality  metadata  hidden  in  data  will  be 
revealed,  e.g.  by  utilising  information  from  history  files  or  elapsing  tags  in  the 
case  of  OSM.  Thus  there  are  only  merits  to  VGI  quality  visualisation  for  both 
VGI  data  and  VGI  projects. 

VGI  quality  visualisation  is  also  of  interest  to  National  Mapping  and  Cadas¬ 
tral  Agencies  (NMCAs)  that  embrace  VGI.  Today  many  NMCAs  encourage 
and  welcome  VGI  contributions  in  their  geoportals  (see  Chapter  13  by  Olte- 
anu-Raimond  et  al.,  2017a).  Volunteers  are  playing  an  increasingly  important 
role  in  ensuring  that  authoritative  sources  of  geographic  information  are  accu¬ 
rate  and  kept  up-to-date.  VGI  data  and  authoritative  data  can  be  visualised 
in  the  geoportal  of  NMCAs  and  one  of  the  aforementioned  methods  can  be 
employed  to  portray  quality.  Data  will  be  enhanced,  but  at  the  same  time  the 
user  will  be  informed  about  data  quality.  Whereas  authoritative  data  can  be  bet¬ 
ter  in  terms  of  quality  elements  such  as  homogeneity  (Olteanu-Raimond  et  al., 
2017b),  VGI  may  prove  to  be  better  in  terms  of  completeness  (Vandecasteele 
and  Devillers,  2015),  currency  (Goodchild  and  Glennon,  2010)  and  positional 
accuracy  (Haklay,  2010).  These  differences  in  quality  may  only  become  appar¬ 
ent,  especially  to  non-experts,  through  visualisation. 

For  the  future  development  of  this  research  topic,  it  is  proposed  to  create 
a  prototype  for  VGI  quality  visualisation,  combining  existing  measures  and 
indicators  (Antoniou  and  Skopeliti,  2015)  of  VGI  quality  with  a  variety  of  visu¬ 
alisation  methods  (MacEachren  et  al.,  2005;  Kinkeldey  et  al,  2014a).  For  the 
choice  of  suitable  visualisation  methods  for  the  crowd,  it  is  important  to  con¬ 
firm  the  usability  and  effectiveness  of  methods  with  the  pool  of  VGI  users.  The 
prototype  can  be  used  to  conduct  a  user  survey  that  records  and  evaluates  the 
crowd  response  on  VGI  quality  visualisation  and  verifies  methods  in  practice. 
Knowledge  about  VGI  quality  visualisation  as  it  relates  specifically  to  the  crowd 
acquired  through  a  user  survey  can  then  be  implemented  in  the  development  of 
an  interactive  visualisation  environment  in  the  framework  of  any  VGI  project. 


Notes 

1  http://wiki.openstreetmap.Org/wiki/Quality_assurance#Visualisation_tools 

2  A  heat  map  utilizes  a  colour  scheme  that  is  part  of  the  colour  spectrum;  it  is 
called  heat  map  because  this  colour  scheme  is  traditionally  used  in  cartog¬ 
raphy  for  the  visualisation  of  temperature. 
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Abstract 

Volunteered  Geographic  Information  (VGI)  has  become  a  rich  and  well  estab¬ 
lished  source  of  geospatial  data.  From  the  popular  OpenStreetMap  (OSM)  to 
many  citizen  science  projects  and  social  network  platforms,  the  amount  of  geo¬ 
graphically  referenced  information  that  is  constantly  being  generated  by  citi¬ 
zens  is  burgeoning.  The  main  issue  that  continues  to  hamper  the  full  exploita¬ 
tion  of  VGI  lies  in  its  quality,  which  is  by  its  nature  typically  undocumented  and 
can  range  from  very  high  quality  to  very  poor.  A  crucial  step  towards  improv¬ 
ing  VGI  quality,  which  impacts  on  VGI  usability,  is  the  development  and  adop¬ 
tion  of  protocols,  guidelines  and  best  practices  to  assist  users  when  collecting 
VGI.  This  chapter  proposes  a  generic  and  flexible  protocol  for  VGI  data  col¬ 
lection,  which  can  be  applied  to  new  as  well  as  to  existing  projects  regardless 
of  the  specific  type  of  geospatial  information  collected.  The  protocol  is  meant 
to  balance  the  contrasting  needs  of  providing  VGI  contributors  with  precise 
and  detailed  instructions  while  maintaining  and  growing  the  enthusiasm  and 
motivation  of  contributors.  Two  real-world  applications  of  the  protocol  are  pre¬ 
sented,  which  guide  the  collection  of  VGI  in  respectively  the  generation  and 
updating  of  thematic  information  in  a  topographic  building  database;  and  the 
uploading  of  geotagged  photographs  for  the  improvement  of  land  use  and  land 
cover  maps.  Technology  is  highlighted  as  a  key  factor  in  determining  the  suc¬ 
cess  of  the  protocol  implementation. 


Keywords 

Volunteered  Geographic  Information,  protocol,  best  practices,  data  collection, 
data  quality. 


1  Introduction  and  Background 

Volunteered  Geographic  Information  (VGI)  represents  an  important  new 
source  of  citizen -contributed  data  (Goodchild,  2007),  as  outlined  in  detail  in 
Chapter  2  (See  et  al.,  2017).  VGI  can  be  a  complementary  source  of  information 
to  authoritative  data  such  as  detailed  road  networks  and  building  footprints, 
and  may  be  the  only  source  of  map  data  usable  after  a  natural  disaster  or  crisis 
event  has  occurred,  for  example  in  the  case  of  mapping  efforts  by  the  Humani¬ 
tarian  OpenStreetMap  Team  (HOT)1.  Yet  the  main  barrier  to  the  widespread 
use  of  VGI  remains  the  assessment  and  documentation  of  data  quality  (John¬ 
son  and  Sieber,  2013;  Olteanu-Raimond  et  al.,  2017a).  This  is  particularly  true 
when  quality  compliance  is  an  essential  requirement  for  VGI  exploitation,  such 
as  for  its  exploitation  by  governments,  National  Mapping  Agencies  (NMAs), 
public  bodies  (fire  fighters,  civil  protection  etc.)  and  private  companies,  which 
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make  use  of  geospatial  data  to  take  decisions.  From  this  perspective,  an  analysis 
of  VGI  exploitation  by  NMAs  is  made  in  Chapter  13  (Olteanu-Raimond  et  al., 
2017b),  while  some  guidance  on  VGI  data  quality  assessment  is  provided  in 
Chapter  7  (Fonte  et  al.,  2017).  The  latter  chapter  describes  measures  and  indica¬ 
tors  that  are  generally  applied  to  VGI  after  the  data  have  been  collected.  Instead, 
more  attention  should  be  placed  on  how  to  ensure  high-quality  data  collection 
during  the  data  capture  phase.  One  approach  for  doing  this  is  to  develop  and 
adopt  generic  and  flexible  guidelines,  best  practices  and  protocols  for  VGI  col¬ 
lection.  While  guidelines  and  best  practices  refer  to  a  set  of  rules,  instructions, 
suggestions,  recommendations  or  situations  that  indicate  how  VGI  should  be 
collected,  perhaps  by  reference  to  examples  or  ideal  cases,  protocols  can  be 
defined  as  strict  sequences  of  instructions  regulating  VGI  collection.  Specific 
attention  should  be  paid  to  the  structure  and  complexity  of  such  guidelines, 
best  practices  and  protocols;  in  particular,  they  should  not  discourage  citizens 
from  contributing,  while  simultaneously  ensuring  that  the  collected  data  are  of 
an  acceptable  quality  for  the  purpose  of  the  specific  VGI  project.  Not  secondar¬ 
ily,  they  should  ease  or  facilitate  the  reuse  of  VGI  for  projects  and  applications 
other  than  the  one(s)  it  was  originally  collected  for. 

The  relevance  of  establishing  protocols  in  VGI  projects  and  the  potential  prob¬ 
lems  for  communities  and  society  that  arise  when  these  protocols  are  absent 
have  been  highlighted  by  many  authors,  including  Sui  (2007),  Johnson  and  Sie- 
ber  (2013)  and  See  et  al.  (2016).  In  Europe,  only  a  few  NMAs  have  experience 
with  using  or  integrating  VGI  in  their  authoritative  datasets  (Olteanu-Raimond 
et  al.,  2017a),  while  protocols  for  VGI  within  NMAs,  governments  or  Com¬ 
mercial  Mapping  Companies  (CMCs)  are  lacking  (Johnson  and  Sieber,  2013). 
Conversely,  as  mentioned  above,  many  authors  have  developed  methodologies 
to  study  the  quality  of  VGI  (after  it  has  been  collected)  and  have  undertaken 
VGI  comparison,  integration  or  conflation  with  data  from  NMAs  and  CMCs 
to  build  more  up-to-date,  accurate  and  complete  datasets  (Girres  and  Touya, 
2010;  Haklay,  2010;  Tudwig  et  al.,  2011;  Al-Bakri  and  Fairbairn,  2012;  Du  et  al., 
2012;  Pourabdollah  et  al.,  2013;  Touya  et  al.,  2013;  Gao  et  al.,  2014;  Jokar  Arsan- 
jani  et  al.,  2015b;  Brovelli  et  al.,  2016a;  Fan  et  al.,  2016). 

To  instruct  users  in  the  production  of  data  that  are  fit-for-purpose,  some  VGI 
projects  provide  detailed  guidelines  instead  of  defining  a  real  protocol.  Open- 
StreetMap  (OSM)2  is  the  most  popular  VGI  project  and  one  of  the  most  stud¬ 
ied  in  the  literature  (Jokar  Arsanjani  et  al.,  2015c);  it  is  extensively  described 
in  Chapter  3  (Mooney  and  Minghini,  2017).  Over  its  more  than  ten  years  of 
life,  there  has  been  a  progressive  development  of  guidelines  about  the  types  of 
geographic  features  that  users  can  create  and  the  attributes  (or  tags)  that  can 
be  attached  to  them.  The  updated  version  of  these  guidelines  is  maintained  in 
a  page3  on  the  OpenStreetMap  Wiki,  while  their  development  and  enrichment 
over  time  is  discussed  in  Chapter  8  (Antoniou  and  Skopeliti,  2017).  It  is  worth 
mentioning  that,  although  a  real,  strict  protocol  for  creating  OSM  data  does 
not  exist  and  indeed  there  is  considerable  freedom  left  to  the  contributors, 
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several  studies  have  documented  the  high  quality  of  OSM  crowdsourced  data¬ 
sets  (see  e.g.  Neis  et  al.,  2011;  Fan  et  at,  2014;  Dorn  et  al.,  2015;  Jokar  Arsanjani 
et  al,  2015a).  Another  example  of  VGI  project  that  provides  guidelines  is  the 
National  Map  Corps4,  a  mapping  crowdsourcing  programme  similar  to  OSM 
that  supports  the  Geospatial  Information  Office  of  the  U.S.  Geological  Survey 
(USGS)  in  gathering  rapidly-changing  landscape  feature  data  for  The  National 
Map  (Bearden,  2007). 

In  other  cases,  protocols  have  been  designed  to  assist  volunteers  in  contrib¬ 
uting  high-quality  data  that  could  fit  the  VGI  projects  needs  and  purposes.  A 
well  known  example  is  that  of  Geo-Wiki  (Fritz  et  al.,  2012),  which  is  an  online 
crowdsourcing  platform  where  volunteers  -  provided  with  a  strict  and  detailed 
protocol  -  are  asked  to  use  very  fine  spatial  resolution  imagery  to  gather  infor¬ 
mation  on  land  cover  and  land  use  to  improve  global  land  cover  maps.  Simi¬ 
larly,  an  extensive  and  detailed  protocol  for  digitising  old  French  maps  was  cre¬ 
ated  and  enriched  through  user  collaboration  on  a  dedicated  platform5,  which 
allowed  for  consistent  data  records  to  be  maintained  (Perret  et  al.,  2015).  In  the 
same  way,  the  GeoPeuple  project  used  protocols  to  create  topographic  vector 
datasets  from  old  French  maps  for  analysing  population  growth  (Ruas  et  al., 
2014).  The  Degree  Confluence  Project6  is  an  example  of  a  project  applying  a  pro¬ 
tocol  to  collect  photographs  of  the  landscape  from  all  the  intersection  points  (or 
confluences)  of  one  degree  latitude-longitude  around  the  globe.  Volunteers  are 
asked  to  take  either  photographs  in  the  four  cardinal  compass  directions  (north, 
south,  east,  west)  or  one  or  more  panoramic  views  from  the  intersection,  one 
general  photograph  taken  within  100  metres  of  the  confluence,  and  one  photo¬ 
graph  of  the  GPS  used.  Users  then  upload  all  the  photographs,  along  with  a  text 
describing  the  landscape  as  well  as  their  journey  to  the  confluence  point  (Fritz 
et  al.,  2009).  In  principle,  these  photographs  may  then  be  reused  in  another  VGI 
project  to  yield  reference  data  for  map  validation  (Foody  and  Boyd,  2012). 

The  addition  of  such  protocols  in  VGI  projects  usually  comes  with  trade¬ 
offs;  in  other  words,  as  the  complexity  or  length  of  the  protocol  increases,  the 
participation  or  retention  rate  may  become  lower  (see  Chapter  5  (Fritz  et  al, 
2017)  on  motivation  and  participation  for  examples).  A  contrary  example  to 
the  Degree  Confluence  Project  in  the  same  domain  of  VGI  photograph-based 
initiatives  is  represented  by  Flickr  and  Panoramio.  These  are  VGI  photograph 
sharing  sites  that  do  not  provide  any  protocols  regarding  how  the  photographs 
should  be  taken  or  what  information  should  be  added.  Users  can  add  a  title,  a 
comment/ description,  one  or  more  tags  and  the  location,  but  these  are  optional. 
The  lack  of  protocols  is  reflected  in  the  very  high  participation  rates  (Michel, 
2015;  Panorank,  2016),  but  also  in  the  variable  quality  of  the  contributions  when 
considering  them  for  applications  such  as  land  cover  and  land  use  mapping  (see 
e.g.  Leung  and  Newsam,  2012;  Estima  and  Painho,  2014;  Antoniou  et  al.,  2016). 

To  show  an  example  of  the  variability  of  the  photographs  in  terms  of  tags,  a 
random  sample  of  around  130,000  geotagged  photographs  that  were  uploaded 
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to  Flickr  and  Panoramio  for  the  London  region  in  May  2015  was  analysed. 
The  frequency  of  the  number  of  tags  associated  with  the  photographs  was 
computed  and  plotted  in  Figure  1  as  a  function  of  increasing  numbers  of  tags. 
Clearly  the  vast  majority  of  photographs  (almost  1/3  of  the  total)  have  no  tags 
associated  to  them.  In  addition,  the  number  of  photographs  with  one  to  seven 
tags  are  within  the  limits  of  random  variation  (although  some  trends  can  be 
spotted;  for  instance  if  a  user  decides  to  include  tags,  they  usually  prefer  to 
append  from  two  to  six  tags  instead  of  just  one).  Conversely,  the  frequency  of 
photographs  with  eight  or  more  tags  shows  an  almost  progressive  decrease. 
This  can  be  seen  as  a  proxy  for  the  following  relationship:  the  more  freedom 
users  have  in  terms  of  contributions,  the  more  heterogeneous  the  contributions 
will  be,  accompanied  with  a  likely  decrease  in  average  quality  in  terms  of  their 
use  in  further  applications.  Hence  the  role  of  guidelines  and  protocols  could 
substantially  increase  the  exploitation  of  VGI  for  applications  not  even  consid¬ 
ered  by  the  person  collecting  the  data. 

The  definition  of  protocols  is  more  common  in  other  established  citizen  sci¬ 
ence  activities  where  many  examples  can  be  found.  Accurate  data  collection  by 
citizens  depends  on  the  provision  of  three  elements:  clear  data  collection  proto¬ 
cols,  simple  and  logical  data  forms,  and  support  for  participants  on  protocol  use 
and  information  submission  (Bonneyet  al.,2009).  Pococket  al.  (2014)  argue  that 
volunteers  are  more  likely  to  provide  information  following  a  given  standard  if 
the  value  of  their  contribution  is  recognised.  However,  if  the  project  requires  a 
complex  standard  for  gathering  data,  strategies  for  supporting  participants  must 
be  deployed  and  protocols  need  to  be  thoroughly  tested  (Tweddle  et  al.,  2012). 
Acknowledgement  of  participants,  even  simply  demonstrating  the  usefulness  of 
the  data,  plays  a  central  role  in  encouraging  participation  (Pilz  et  al.,  2006). 

As  discussed  in  more  detail  in  Chapter  2  (See  et  al.,  2017),  VGI  can  be  col¬ 
lected  either  actively  or  passively.  While  in  active  projects  users  collect  data  in 
a  conscious  way,  passive  data  collection  happens  when  contributions  are  gath¬ 
ered  without  any  active  engagement  (Haklay,  2013).  Similarly,  Harvey  (2013) 
has  made  a  distinction  between  truly  volunteered  versus  contributed  geo¬ 
graphic  information  (CGI).  While  the  former  refers  to  data  that  are  collected 
with  permission  (such  as  an  edit  in  the  OSM  database),  the  latter  refers  to  data 
collected  as  part  of  an  automated,  open-ended  or  uncontrollable  process  (such 
as  the  tracking  of  mobile  phones).  Information  contributed  to  a  passive  VGI 
project  typically  demands  much  more  processing  to  result  in  meaningful  infor¬ 
mation.  It  is  possible  to  impose  a  set  of  protocols  in  active  VGI,  but  this  is  usu¬ 
ally  not  possible  when  using  passive  VGI  or  CGI,  where  the  data  volumes  are 
often  larger  than  in  active  sources  and  hence  the  data  need  to  be  filtered  if  they 
are  to  be  used.  For  example,  Bordogna  et  al.  (2015)  demonstrated  how  input 
data  can  be  filtered  based  on  minimum  quality  criteria  specified  by  the  user,  for 
example  to  remove  geotagged  photographs  downloaded  from  repositories  such 
as  Flickr  and  Panoramio. 
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photographs:  130,408;  total  number  of  tags:  700,789). 
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Hence  this  chapter  limits  its  focus  to  active  VGI  projects,  where  the  role 
played  by  protocols  can  be  crucial  for  the  quality  of  the  data  collected.  The 
chapter  seeks  to  emphasise  the  need  for  data  collection  protocols  in  VGI  pro¬ 
jects,  and  explores  how  technology  can  be  seamlessly  exploited  to  facilitate 
collection  of  suitable  data.  The  chapter  takes  its  origin  in  a  previous  work  by 
Mooney  et  al.  (2016),  who  defined  a  general  and  flexible  protocol  for  collecting 
VGI  vector  data. 

In  Section  2  this  protocol  is  briefly  presented  with  the  idea  of  generalising 
it  to  all  types  of  VGI  projects  and  VGI  data  collected.  In  Section  3  attention  is 
placed  on  which  protocols  are  required  to  meet  minimum  data  quality  require¬ 
ments  and  how  technology  can  play  a  role  in  helping  to  enforce  protocols  in 
a  user-friendly  way.  Section  4  presents  examples  of  how  the  protocol  can  be 
applied  to  two  real-world  applications,  one  related  to  the  collection  of  VGI  vec¬ 
tor  data  and  the  other  to  geotagged  photographs,  and  reflects  upon  the  rela¬ 
tionship  between  protocols  and  volunteer  motivation.  Section  5  concludes  the 
chapter  and  explores  open  questions  as  well  as  the  needs  and  directions  for 
future  research. 


2  A  Reference  Protocol  for  VGI  Collection 

A  generic  protocol  has  been  proposed  and  developed  by  Mooney  et  al.  (2016), 
which  can  be  applied  by  new  VGI  projects  focused  on  vector  data  collection.  It 
can  also  be  used  retrospectively  on  existing  data  in  current  VGI  projects.  This 
protocol  aims  to  be  inclusive  of  all  participants  to  VGI  projects,  from  new  to 
experienced  VGI  contributors.  By  guiding  contributors  in  the  process  of  VGI 
data  collection,  the  protocol  seeks  to  improve  the  quality  of  data  in  order  to 
both  fit  the  purpose  of  the  specific  VGI  project  for  which  they  are  collected  and 
to  facilitate  their  reuse  within  other,  future  and  potentially  unintended,  appli¬ 
cations.  The  protocol  assumes  only  a  basic  working  knowledge  of  geographic 
information  science  with  basic  file  and  data  handling  skills  from  information 
technology.  The  protocol  has  been  developed  in  a  bidirectional  fashion,  i.e.  the 
authors  have  carefully  considered  mapping  practices  in  bottom-up  approaches 
(VGI,  for  example)  and  top-down  approaches  (like  those  used  by  some  NMAs) . 
In  this  way  the  protocol  is  positioned  at  the  intersection  between  these  two 
opposing  approaches  for  the  generation  and  collection  of  geographic  vector 
information. 

The  protocol  should  be  reasonably  general  and  potentially  usable  by  any  VGI 
project  based  on  the  collection  of  vector  data  through  digitisation,  field  survey 
or  bulk  import.  The  authors  have  been  careful  not  to  relate  to  any  specific  VGI 
initiative,  like,  for  example,  OSM,  so  as  to  ensure  the  protocol  has  potential  for 
further/future  customisation  or  improvement  for  other  specific  VGI  projects. 
On  the  other  hand,  it  gives  concrete  technical  recommendations  to  easily  guide 
users  into  a  replicable  step-by-step  data  collection  process  using  the  tools  and 
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processes  that  they  currently  possess  and  use.  The  protocol  is  formalised  into 
five  main  stages  as  follows: 

•  Initialisation 

•  Data  Collection 

•  Self-Assessment/Quality  Control 

•  Data  Submission 

•  Feedback  to  the  Community. 

Initialisation  -  This  involves  the  users  of  the  protocol  becoming  familiar 
with  the  VGI  project  and  its  specific  goals  and  objectives.  Familiarisation 
with  the  proper  devices  or  technologies  for  the  tasks  to  be  accomplished  is 
required.  Users  are  encouraged  to  conduct  tests  of  the  data  collection  pro¬ 
cess  to  familiarise  themselves  with  the  process  in  general. 

Data  Collection  -  Users  must  carefully  plan  the  data  collection  process. 
Data  collection  in  this  protocol  can  be  considered  as  one  of  the  following: 
digitisation,  field  survey,  or  bulk  import  of  existing  vector  data.  Obsta¬ 
cles,  problems  and  technical  issues  with  the  specific  type  of  data  collection 
method  must  be  carefully  considered  before  proceeding.  At  all  times  data 
collection  must  be  performed  according  to  the  VGI  project  specifications. 
Self-Assessment/Quality  Control  -  This  step  involves  users  making  their 
own  checks  and  assessments  of  their  data  collection  process  and  the  data 
that  have  been  collected.  The  users  should  clearly  state  if  problems  were 
encountered  (for  instance  if  there  was  a  GPS  signal  loss  during  field  col¬ 
lection,  licence  issues  in  bulk  import,  or  poor  resolution  imagery  used  in 
digitisation). 

Data  Submission  -  In  this  step  users  submit,  potentially  using  specific 
application  software,  all  the  data  to  the  project  website  or  application.  Sub¬ 
mission  must  be  successful  and  a  post-submission  check  should  outline  any 
issues  that  were  encountered  during  this  process. 

Feedback  to  the  Community  -  The  protocol  encourages  users  to  use  all 
available  channels  to  provide  feedback  on  their  experiences.  According  to 
Perret  et  al.  (2015),  controlling,  tracking  and  reporting  all  aspects  of  the 
process  is  recommended  in  VGI.  Feedback  includes  any  problems  that  were 
encountered,  issues  that  the  user  resolved,  tips  or  guidance  for  other  users 
in  the  project  etc. 

Despite  these  five  main  stages  of  data  collection  being  intended  to  be  sequen¬ 
tial,  it  is  sometimes  not  easy  to  establish  a  well  defined  limit  between  them.  For 
example,  during  data  collection  the  VGI  contributors  may  need  to  get  back  to 
the  initialisation  stage  to  get  more  insight  on  the  project  specifications;  simi¬ 
larly,  contributors  may  realise  that  quality  control  is  required  again  after  data 
submission. 
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Currently,  the  protocol  described  is  available  to  participants  in  VGI  projects 
in  the  form  of  a  printed  or  soft  copy  manual  or  document.  The  future  goal  of 
this  work  is  to  communicate  the  concepts  of  the  proposed  protocol  in  order 
to  also  influence  and  guide  future  software  implementations  for  VGI  vector 
data  collection.  As  will  be  shown  through  examples  in  Section  4,  in  order  for 
the  protocol  to  be  effectively  adopted  by  VGI  projects,  the  role  of  technology  - 
and  hence  of  VGI  software  developers  -  is  fundamental.  If  this  protocol  can 
be  directly  implemented  in  software  within  VGI  projects,  the  protocol  can  be 
communicated  to  more  users  and  lead  to  overall  improvements  in  VGI  vector 
data  collection. 


3  The  Role  of  Protocols  for  VGI  Quality 

While  for  authoritative  data  the  evaluation  of  data  quality  is  a  well  established 
subject,  in  VGI  it  remains  rather  elusive  and  vague.  What  is  fundamentally 
different  between  authoritative  data  and  VGI  is  the  data  collection  process. 
For  NMAs  and  CMCs,  rigorous  protocols  and  well  defined  procedures  are  in 
place  that  must  be  followed  by  surveyors.  The  management  of  surveyors,  the 
updating  of  the  protocols  and  the  specifications,  and  the  migration  from  a 
data  scheme  to  another  are  fully  controlled.  A  totally  different  landscape  exists 
for  VGI  projects,  in  which  the  enthusiasm  of  an  enormous  but  disparate  set 
of  volunteers  is  the  driving  force.  In  the  case  of  NMAs  and  CMCs  the  logic  is 
simple:  production  protocols  and  specifications  need  to  be  followed,  since  the 
final  product  will  be  examined  for  its  quality  using  various  measures  (such 
as  the  ISO/TC211  quality  framework).  Similarly,  in  VGI  volunteers  should 
have  to  fully  understand  that  following  or  ignoring  guidelines,  best  practices 
and  protocols  will  have  a  direct  impact  on  the  final  spatial  product  and  con¬ 
sequently  on  its  usability.  VGI  projects  can  learn  a  lot  from  the  advances  in 
citizen  science.  In  many  cases,  the  quality  of  data  in  citizen  science  is  attained 
through  carefully  designed  and  standardised  protocols  for  participation 
(Kasperowski  and  Kullenberg,  2015).  Standardisation  ensures  the  validity  and 
accuracy  of  contributions  and  classifications  performed  by  citizens  (Cohn, 
2008:  194).  In  this  context,  the  following  subsections  examine,  in  detail,  each 
of  the  five  data  collection  stages  described  above  against  protocol  and  best 
practice  instructions. 


3.1  Initialisation 

One  aspect  that  may  influence  the  quality  of  the  collected  information  is  the 
type  of  instructions  provided  to  the  volunteers  in  the  initialisation  stage. 
While  the  initial  impulse  of  most  trained  surveyors  is  to  employ  the  stand¬ 
ard  data  quality  methods  from  their  field,  when  designing  citizen  science 
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projects  a  different  approach  for  ensuring  data  quality  may  be  necessary,  tak¬ 
ing  into  consideration  the  degree  of  participation  and  the  expectations  around 
contributors’  skills  (Wiggins  et  al.,  2011).  If  the  VGI  collection  is  made  for  a 
particular  purpose,  then  the  instructions  should  be  detailed  enough  so  that 
volunteers  understand  exactly  what  they  are  expected  to  provide.  However, 
instructions  with  too  much  detail  should  be  avoided,  or  at  least  it  should  not 
be  mandatory  for  the  volunteer  to  go  through  all  the  detail,  because  this  may 
be  demotivating.  The  appropriate  level  of  detail  of  the  instructions  is,  in  some 
circumstances,  not  easy  to  establish.  Therefore,  for  some  types  of  VGI  pro¬ 
jects,  studies  that  identify  how  volunteers  react  to  several  types  of  instructions 
should  be  undertaken,  as  this  reaction  may  have  an  important  impact  on  the 
quality  of  the  generated  data  (Kerle  and  Hoffman,  2013).  Two  practical  exam¬ 
ples  of  the  importance  of  instructions  for  the  quality  of  generated  data  are  the 
following:  if  the  volunteers  need  to  collect  georeferenced  photographs,  then  it 
should  be  indicated  what  must  be  georeferenced:  for  example,  is  it  the  place 
where  the  photograph  was  taken  from  or  the  phenomena  shown  on  the  pho¬ 
tograph?;  and  when  providing  a  classification  of  land  cover  or  disaster  dam¬ 
age,  how  much  detailed  explanation  is  required,  e.g.  the  thematic  resolution  of 
land  cover  classes  or  the  choice  of  one  among  several  damage  classes,  should 
be  determined. 


3.2  Data  Collection 

Familiarising  contributors  with  the  project’s  aims  and  goals  may  enhance  their 
awareness,  which,  in  turn,  can  help  to  improve  the  overall  quality  of  the  con¬ 
tributions.  Nevertheless,  crowdsourced  participation  inherently  suffers  from 
biases,  inconsistencies  and  errors;  thus  the  focus  is  on  how  to  exclude  these 
inherent  characteristics  from  the  data  collection  stage.  Participation  biases  can 
result  from  various  causes.  The  digital  divide,  socio-economic  factors,  demo¬ 
graphic  distribution  and  individual  perceptions  can  all  have  an  influence  on 
volunteer  contributions  (Haklay,  2010;  Brovelli  et  al.,  2016b).  Here  protocols 
should  act  preemptively  and  hinder  the  appearance  of  biases.  For  example,  it 
should  be  taken  for  granted  that  individuals  have  their  own  understanding 
and  conceptualisation  of  the  world  that  might  not  coincide  with  a  VGI  pro¬ 
ject’s  mission  or  specifications.  Protocols  should  clearly  state  the  point  of  view 
that  volunteers  should  hold  and  which  processes  they  should  follow  to  collect 
the  data.  In  an  effort  to  relieve  volunteers  from  extremely  detailed  protocols, 
projects  might  provide  a  minimalistic  approach  on  the  procedures  to  follow 
(Batini  et  al.,  2009).  However,  this  hides  two  dangers:  first,  setting  the  bar  lower 
will  probably  result  in  data  that  are  of  lower  quality.  Secondly,  more  active  and 
experienced  volunteers  might  be  discouraged  by  the  approach  taken.  Thus,  the 
challenge  is  to  provide  protocols  and  best  practices  that  will  balance  data  qual¬ 
ity  with  participation. 


The  Relevance  of  Protocols  for  VGI  Collection  233 


3.3  Self-Assessment/Quality  Control 

Data  collection  might  be  influenced  by  factors  that  make  the  process  error- 
prone,  leading  to  errors  and  inconsistencies  in  the  data.  For  example,  weather, 
landscape,  collaboration  with  other  individuals  or  the  instruments  used  are 
just  a  few  factors  that  might  affect  in-situ  measurements.  Here  the  stage  of 
self-assessment  and  quality  control  has  much  to  offer.  Thus,  before  uploading 
data,  each  volunteer  should  self-assess  the  quality  of  their  data  and  perform 
all  possible  quality  controls.  Protocols  should  provide  enough  guidance  and 
explain  common  pitfalls  that  can  lead  to  inconsistencies  and  errors  and  how 
to  avoid  them. 


3.4  Data  Submission 

The  next  stage  for  which  protocols  should  provide  detailed  guidance  is  data 
submission.  Inevitably,  individual  contributions  are  generally  small,  sparse 
and  fragmented,  and  yet  valuable  for  the  evolution  of  a  crowdsourced  project. 
Active  and  meticulous  data  collection  followed  by  indifferent  data  submission 
(e.g.  just  pressing  the  ‘upload’  button)  might  not  be  sufficient.  Protocols  should 
stress  that  data  submitted  should,  when  possible,  be  validated  against  existing 
observations  or  measurements  so  that  no  vague  or  inconsistent  cases  appear. 
Even  more  important  is  that  an  individual’s  work  does  not  harm  or  destroy 
other  volunteer  contributions.  This  does  not  mean  that  updates  or  alterations 
should  be  avoided,  but  rather  that  it  is  important  to  have  a  balance  between 
contributor  efforts,  a  way  to  evaluate  the  need  for  change,  and  a  versioning  sys¬ 
tem  capable  of  roll-back  to  the  previous  state  of  the  project  if  needed.  Further¬ 
more,  submission  should  not  be  confined  only  to  data:  protocols  should  require 
the  addition  of  metadata  and  supporting/documentation  material  when  pos¬ 
sible.  For  example,  filling  a  form  or  submitting  a  geotagged  image  might  be 
valuable  for  quality  control  by  other  volunteers  or  moderators.  Similarly,  any 
pitfall,  problem  or  simple  concern  encountered  during  the  data  submission 
stage  should  be  appropriately  added  to  the  contributed  data. 


3.5  Feedback  to  the  Community 

Finally,  the  feedback  to  the  community  may  include  the  participation  in  discus¬ 
sion  forums,  which  may  help  other  volunteers  to  create  higher  quality  data. 
Perret  et  al.  (2015)  highlighted  the  fact  that  VGI  projects  should  continuously 
evolve  through  the  feedback  each  contributor  gets  from  and  gives  to  others,  for 
instance  in  terms  of  how  a  certain  problem  encountered  while  collecting  data 
was  solved  or  any  other  recommendations  or  guidance.  Communication  chan¬ 
nels  with  the  VGI  project  managers  and  administrators  should  be  provided  as 
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well  so  that  the  project  itself  can  evolve  based  on  the  user  feedback.  Thus,  a 
continuous  circle  is  formed  that  improves  the  protocol  and  enhances  the  overall 
VGI  project  quality.  This  way,  common  mistakes  will  hopefully  start  to  disap¬ 
pear  and  overall  data  quality  will  be  improved. 

4  Applying  the  Protocol  to  Real-World  Examples 

In  this  section  we  present  two  hypothetical,  extended  examples  of  real-world 
applications  of  the  VGI  vector  data  protocol  described  above.  In  the  first  exam¬ 
ple,  the  protocol  is  applied  to  the  updating  and  collection  of  new  thematic 
information  in  a  topographic  building  database.  In  the  second  example  the 
protocol  is  applied  to  a  different  domain,  that  is  the  collection  of  photographs 
for  land  use  /  land  cover  (LULC)  mapping. 


4. 1  Updating  and  Collecting  New  Thematic  Information  in  a 
Topographic  Building  Database 

In  this  example,  an  NMA  is  interested  in  exploiting  crowdsourced  vector  data 
to  improve  their  topographic  building  database.  This  improvement  includes 
enriching  and  updating  existing  building  objects  (their  geometry  and  thematic 
information)  and  capturing  new  building  objects  and  associated  thematic 
information.  Buildings  are  typically  very  well  mapped  by  NMAs,  but  the  rapid 
pace  of  urban  change  can  mean  that  keeping  their  database  up-to-date  is  chal¬ 
lenging  in  terms  of  resources.  Additionally,  the  thematic  information  within 
these  databases  is  often  very  poor.  Typical  information  which  is  often  missing 
includes:  the  function  of  the  building,  the  number  of  floors  in  the  building, 
cultural  heritage  information  related  to  the  building,  the  entrance!  s),  etc.  As 
an  additional  challenge  and  motivation  for  VGI  contributors,  the  NMA  seeks 
to  create  a  new  layer  from  scratch  to  represent  the  entrances  to  buildings.  This 
will  be  a  multi-point  layer,  since  a  building  might  have  more  than  one  entrance. 
In  this  example,  the  NMA  decides  to  develop  a  Web-based  application  to  allow 
citizens  to  collect  data.  The  implementation  and  presence  of  a  protocol  for 
this  application  will  greatly  assist  in  reducing  the  potential  submission  of  low- 
quality  data.  Specifically,  the  Web-based  application  will  use  digitisation  and 
field  surveys  as  the  means  of  collecting  vector  data.  The  application  will  present 
contributors  with  three  layers:  a  base  layer  consisting  of  up-to-date  orthoim¬ 
agery  of  the  region  represented  in  the  database;  an  overlay  layer  of  the  existing 
topographic  building  object  database;  and  a  layer  for  the  entrances  to  buildings. 
Contributors  will  be  encouraged  to  create  and/or  update  the  geometry  and/or 
thematic  information  of  building  objects  to  reflect  recent  changes  to  building 
function,  structure,  etc.  Additionally,  contributors  will  be  able  to  add  vector 
point  data  to  building  objects  to  indicate  the  position  of  building  entrances 
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along  with  their  door  numbers.  The  implementation  of  the  vector  data  protocol 
for  this  application  will  ensure  that  helpful  advice  and  guidance  is  provided  to 
all  contributors  in  an  attempt  to  maintain  and  ensure  good  quality.  Guidance  is 
provided  for  a  number  of  categories: 

•  Scale:  Select  the  appropriate  cartographic  scale  for  building  level  of  detail, 
and  preserve  it  over  the  collection  and  contribution  process; 

•  Shape:  Preserve  building  shape  as  much  as  possible  (for  instance  keep  the 
building  corners  squared  whenever  convenient)  and  digitise  minimum 
details  appropriate  to  the  scale; 

•  Logical  Consistency:  Ensure  that  new  buildings  contributed  or  existing  ones 
that  are  changed  are  always  closed  polygons  and  do  not  overlap; 

•  Geometric  Consistency:  Ensure  that  multiple  entry  points  to  buildings  are 
represented  as  a  multi-point  object  rather  than  creating  a  new  point  object 
for  each  individual  entrance  in  the  same  building,  and  that  door  numbers 
for  each  entry  point  are  different; 

•  Thematic  Quality  Control:  Propose  a  list  of  thematic  attributes  and  values 
to  the  user; 

•  Metadata:  Allow  free  text  comments  on  the  visual  quality  (such  as  cloud 
cover,  tree  cover,  shadows  or  resolution)  of  the  imagery. 

The  five  steps  of  the  protocol  workflow  outlined  in  Section  2  are  applied  to  this 
example  as  follows: 

Initialisation  -  Citizens  will  need  to  register  themselves  on  the  Web- 
based  application  to  use  it  and  contribute  vector  data  and  information. 
Before  collecting  data,  every  contributor  will  need  to  complete  all  of  the 
steps  in  a  tutorial  demonstration  to  understand  which  tasks  are  required 
and  to  familiarise  themselves  with  the  processes  and  tasks  in  general  and 
with  what  the  goals  and  objectives  of  the  project  are.  Depending  on  the 
resources  available,  the  NMA  may  develop  a  protected  ‘sandbox’  version 
of  the  application,  where  contributors  can  test  out  the  functionality  of  the 
application  on  a  small  subset  of  the  topographic  buildings  database  with¬ 
out  actually  making  changes  to  the  real  database.  This  form  of  training  will 
aid  learning  and  help  volunteers  contribute  effectively  while  still  preserving 
their  motivation. 

Data  Collection  -  Contributors  will  be  encouraged  to  carefully  plan  their 
collection  of  new  or  updated  data/information  for  the  application.  The 
application  will  specifically  allow  the  digitisation  of  building  objects  on  top 
of  the  orthoimagery,  the  addition  of  vector  point  data  on  building  entrances, 
and  the  provision  of  new  or  updated  thematic  information  associated  with 
building  objects.  The  software  application  will  give  prompts  and  tips  to  the 
contributors  as  they  are  working. 
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Self- Assessment/Quality  Control  -  The  application  will  provide  function¬ 
ality  to  allow  contributors  to  make  an  initial  assessment  of  the  quality  of  the 
new  data  or  changes  to  existing  data  that  they  are  submitting.  For  example, 
if  a  contributor  creates  a  new  building  footprint  and  does  not  supply  any 
thematic  information,  the  application  would  indicate  this  to  the  contribu¬ 
tor.  The  contributor  would  then  be  presented  with  a  generic  list  of  thematic 
information  from  which  they  can  choose  the  appropriate  annotations.  This 
would  help  emphasise  the  importance  of  thematic  information  in  the  appli¬ 
cation  in  the  situation  where  many  users  may  attach  greater  importance  to 
geometrical  data. 

Data  Submission  -  In  this  step,  contributors  submit  their  contributed  vec¬ 
tor  data  and/or  thematic  information  to  the  application.  The  application 
will  provide  a  space  where  contributors  can  provide  metadata  or  descrip¬ 
tive  information  about  their  contribution.  This  could  be  used  by  the  NMA 
to  assess  the  overall  quality  of  the  contribution,  as  this  information  would 
describe  the  processes  that  the  contributors  used  to  make  their  contributions. 
Feedback  to  the  Community  -  The  NMA  will  create  a  number  of  informa¬ 
tion  channels  to  encourage  contributors  to  provide  feedback  and  discus¬ 
sions  on  their  experiences  of  using  the  application  and  contributing  vector 
data  using  the  application.  This  feedback  can  include  discussions  on  prob¬ 
lems  encountered  with  specific  building  types  or  structures,  with  certain 
thematic  areas,  etc.  Through  these  channels,  the  NMA  can  provide  assis¬ 
tance  and  feedback  to  the  contributors  in  the  community  by  offering  sug¬ 
gestions  on  how  problems  may  be  fixed  or  resolved  within  the  application. 
This  creates  a  complete  feedback  loop  within  the  vector  protocol,  which  will 
allow  for  the  protocol  to  be  continuously  improved. 


4.2  Using  Geotagged  Photographs  for  LULC  Mapping 

In  this  example,  an  NMA  is  interested  in  exploiting  geotagged  photographs 
to  improve  their  LUTC  maps,  and  in  particular  to  provide  much  more  data 
for  training  their  classification  algorithms  and  also  to  validate  the  map,  if  pos¬ 
sible.  The  NMA  has  already  experimented  the  use  of  photographs  from  exist¬ 
ing  photo-sharing  sites  such  as  Flickr  and  Panoramio,  but  it  was  observed  that 
there  was  too  much  inconsistency  in  the  tags  and  in  the  content  of  the  photo¬ 
graphs  and  thus  that  not  all  photographs  were  usable  for  the  purpose  of  TULC 
mapping.  Also,  there  was  a  strong  spatial  bias  in  the  distribution  of  the  photo¬ 
graphs  and  not  all  required  LULC  types  were  captured. 

Instead,  the  NMA  decides  to  develop  its  own  national-level  photograph¬ 
sharing  site  specifically  for  the  purpose  of  collecting  photographs  for  LULC 
mapping,  which  will  have  a  stricter  protocol  and  ensure  higher  usable  content 
and  tags.  At  the  same  time,  the  data  collection  protocol  should  not  hamper 
creativity  or  the  spontaneous  enthusiasm  that  drives  contributors  while  aiming 
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for  the  huge  volumes  of  data  that  are  a  characteristic  of  popular  social  media 
sites.  The  NMA  decides  to  develop  a  customised  mobile-based  photograph¬ 
sharing  application,  which  can  use  technology  to  help  ensure  that  specific  parts 
of  the  data  collection  protocol  are  adhered  to.  The  application  should  have  the 
following  features: 

•  Contributors  will  be  taken  through  a  step-by-step  procedure  for  each  loca¬ 
tion  photographed; 

•  This  procedure  will  require  the  contributor  to  take  either  a  set  of  photo¬ 
graphs  in  the  four  cardinal  directions  or  a  single  360-degree  photograph.  If 
the  participant  chooses  the  option  of  taking  photographs  in  four  different 
directions,  then  the  compass  in  the  mobile  device  will  only  allow  the  user  to 
take  a  photograph  when  facing  the  correct  cardinal  direction; 

•  The  application  will  prevent  participants  from  using  the  zoom  function, 
ensuring  that  the  photographs  show  content  closest  to  their  geographic 
position; 

•  A  ‘guide  line’  will  be  added  to  the  application  so  that  the  contributors  can 
line  up  the  horizon  with  the  ‘guide  line’,  so  that  photographs  containing 
one-third  sky  and  two-thirds  landscape  are  taken; 

•  The  photograph  should  be  dominated  by  landscape  but  without  restricting 
the  addition  of  other  elements  (such  as  people  and  animals);  moderators  or 
automated  methods  can  be  used  to  assign  weights  to  these  photographs  for 
the  purpose  of  LULC  creation/validation; 

•  Once  the  photographs  are  taken,  the  participant  will  be  presented  with  the 
possibility  to  assign  tags  from  a  pre-specified  list  (drawn  from  the  LULC 
nomenclature  used  by  the  NMA)  to  the  photographs,  which  will  be  manda¬ 
tory,  along  with  the  possibility  to  add  free  form  tags,  which  will  be  optional; 

•  The  final  step  in  the  procedure  will  be  to  ask  contributors  to  estimate  the 
distance  at  which  the  LULC  changes,  to  indicate  how  homogeneous  or  het¬ 
erogeneous  the  landscape  is; 

•  There  will  be  at  least  two  modes  of  operation  in  the  protocol.  In  the  first 
mode,  participants  can  take  photographs  at  any  location,  so  the  geotagged 
photographs  will  be  useful  for  creating  LULC  training  datasets;  in  the  second 
mode,  participants  will  be  sent  to  specific  locations,  or  ‘quests’  in  the  form  of 
photograph-caching,  which  can  be  used  to  satisfy  the  sampling  needs  of  the 
NMA  for  LULC  map  validation  and  reduce  the  spatial  bias  that  is  common 
in  geotagged  photographs  from  social  media  photograph-sharing  sites. 

As  much  as  possible,  elements  of  the  protocol  will  be  hidden  or  incorporated 
seamlessly  into  the  workflow  of  the  application  through  technology.  In  other 
cases,  the  protocol  will  be  implemented  via  elements  of  gamification,  which 
will  be  added  to  maintain,  if  not  grow,  the  pool  of  participants  and  to  create  a 
certain  level  of  competition  among  them,  particularly  for  the  photo-caching 
mode  of  the  application. 
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Following  the  vector  protocol  outlined  in  Section  2,  the  five  steps  are  applied 
as  follows: 

Initialisation  -  This  first  stage  will  be  achieved  by  providing  contributors 
with  a  guided  tour  of  the  project,  including  information  on  how  each  step 
contributes  to  the  overall  objectives  of  the  project.  In  addition,  step-by- 
step  instructions  will  be  provided  to  contributors  when  they  first  use  the 
application.  The  guided  tour  will  be  mandatory  yet  short  and  easy  to  follow. 
Once  the  user  has  ‘passed’  through  this  stage  and  become  familiar  with  the 
function  of  the  application,  they  will  be  able  to  take  further  photographs. 
Data  Collection  -  This  will  be  implemented  via  field  survey,  which  will  be 
facilitated  by  the  mobile  application.  As  outlined  above,  there  will  be  two 
main  modes  of  data  collection  where  participants  can:  (i)  photograph  land¬ 
scapes  in  any  location  or  (ii)  be  directed  to  specific  locations.  Optionally,  a 
third  mode  will  be  possible  in  which  participants  can  turn  off  the  protocol 
and  photograph  freely.  The  purpose  of  these  three  modes  will  be  clearly 
explained  to  the  participants.  The  mode  employed  will  also  allow  the  NMA 
to  categorise  the  photographs  for  a  specific  use:  the  first  mode  maybe  more 
suitable  for  LULC  map  creation;  the  second  for  LULC  map  validation;  while 
the  third  can  be  either  omitted  or  used  for  training  after  careful  checking. 
Self- Assessment/Quality  Control  -  In  this  step  the  mobile  application  will 
record  the  positional  accuracy  and  other  related  parameters  (such  as  dilu¬ 
tion  of  precision  (DOP)  and  type  of  GPS  receiver)  as  an  additional  source 
of  information  to  accompany  the  photographs.  Through  the  application, 
the  contributor  will  also  estimate  the  heterogeneity  of  the  LULC,  which 
will  provide  the  NMA  with  an  indication  of  whether  the  photograph  is  in  a 
homogeneous  or  mixed  land  cover  class.  There  will  be  a  mechanism  imple¬ 
mented  that  will  allow  contributors  to  review  the  photographs  in  order  to 
make  sure  that  they  comply  with  the  protocol  and  are  of  sufficient  qual¬ 
ity.  Contributors  will  be  given  the  option  to  retake  photographs  that  are  of 
poorer  quality.  For  instance,  in  this  stage  the  app  will  display  the  position  of 
the  photographs  taken  on  top  of  orthoimagery  in  order  to  easily  spot  posi¬ 
tions  recorded  with  low  accuracy. 

Data  Submission  -  The  application  will  not  require  data  connection  in  the 
field  but  will  automatically  synchronise  the  photographs  when  connected 
to  wifi,  so  that  poor  mobile  signals  will  not  be  an  issue.  Once  photographs 
are  submitted,  the  online  application  will  allow  contributors  to  view,  share 
and  manage  their  photographs,  for  instance  to  correct  the  tagging  of  their 
photographs  and  thereby  improve  the  labels  needed  for  LULC  classification. 
Feedback  to  the  Community  -  The  final  step  will  consist  in  sending  out 
regular  information/ rich  newsletters  to  contributors,  giving  them  informa¬ 
tion  about  levels  of  improvement  in  LULC  mapping,  highlighting  those 
areas  that  have  been  better  mapped  and  featuring  the  contributions  of 
active  contributors.  It  will  also  highlight  what  areas  are  missing  and  guide 
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participants  to  go  out  and  photograph  these  areas.  At  this  stage,  the  online 
application  will  also  allow  contributors  to  rate  the  contributions  of  other 
participants  and  start  conversations  and  discussions  in  order  to  exchange 
and  share  suggestions  that  would  lead  to  an  overall  improvement  in  the 
project’s  data  quality. 

Although  some  research  on  using  geotagged  photographs  for  LULC  training 
and  validation  has  been  undertaken  in  the  past  (see  e.g.  Antoniou  et  al.,  2016), 
this  example  is  still  largely  hypothetical.  However,  a  similar  protocol  for  collect¬ 
ing  geotagged  photographs  for  LULC-related  purposes  is  currently  being  tested 
by  the  FotoQuest  Europe  student  campaign7.  This  initiative  asks  volunteers  to 
survey  specific  locations  with  the  purpose  of  validating  the  official  EU  LULC 
datasets  derived  from  the  Land  Use  and  Coverage  Area  frame  Survey  (LUCAS) 
performed  by  EUROSTAT8.  For  more  information  on  what  geotagged  photo¬ 
graphs  can  offer,  see  Chapter  4  (Touya  et  al.,  2017)  on  using  geotagged  pho¬ 
tographs  for  examining  OSM  quality  and  for  verifying  the  applicability  and 
suitability  of  various  cartographic  processes. 


5  Discussion  and  Conclusions 

VGI  has  become  a  mainstream  presence  in  the  GIScience  domain.  By  its  own 
nature,  the  driving  force  behind  VGI  lies  in  the  crowd.  The  progressive  mitiga¬ 
tion  of  the  digital  divide  -  not  just  the  traditional  one  that  considers  Internet 
access,  but  also  the  second-level  digital  divide  that  looks  at  the  real  capacity  of 
people  to  make  use  of  available  technology  (Hargittai,  2002)  -  will  likely  result 
in  an  ever  increasing  amount  of  contributions  uploaded  to  VGI  initiatives.  Sta¬ 
tistics9  and  predictive  models  (Jokar  Arsanjani  et  al.,  2015a)  for  the  OSM  pro¬ 
ject  confirm  an  increasing  growth  in  both  the  number  of  new  contributors  and 
submitted  data,  while  Mooney  and  Winstanley  (2015)  have  argued  that  VGI 
contributions  can  be  considered  a  form  of  big  data.  In  turn,  the  increase  in  VGI 
may  also  increase  the  heterogeneity  of  contributions  and  hence  solving  quality 
issues  for  assessing  VGI  usability  may  become  harder  in  the  future. 

In  citizen  science  projects,  especially  those  in  the  field  of  conservation  and 
ecology,  protocols  and  guidelines  for  data  collection  are  generally  well  devel¬ 
oped  and  clearly  accepted  by  the  contributors.  In  contrast,  by  its  very  same 
nature,  the  world  of  VGI  has  developed  in  a  much  freer,  diverse  and  often 
uncontrolled  fashion.  Even  OSM,  which  since  its  birth  has  dominated  the  VGI 
scene,  features  a  culture  of  freedom  in  terms  of  what  is  mapped  and  which 
tags  are  provided.  Hence,  this  chapter  has  investigated  the  need  and  oppor¬ 
tunity  to  integrate  protocols  in  order  to  rule  and  guide  the  data  collection 
process  in  active  VGI  projects,  with  the  purpose  of  increasing  the  quality  of 
volunteer  contributions.  A  general  and  flexible  protocol  was  introduced  and 
described,  which  can  be  exploited  to  standardise  data  collection  processes  in 
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VGI  initiatives.  The  protocol  is  suitable  for  implementation  in  new  as  well  as 
existing  VGI  projects  and  can  serve  as  a  reference  tool,  not  just  for  the  project 
volunteers,  but  also  for  the  project  managers  and  developers  who  need  to  put 
in  place  the  best  possible  system  to  facilitate  collection  of  high-quality  data. 
The  implementation  of  the  proposed  protocol  was  illustrated  through  two  dif¬ 
ferent  hypothetical  examples. 

The  first  example  sees  an  NMA  developing  an  application  for  crowdsourced 
data  collection  aimed  at  enriching  and  improving  its  topographic  build¬ 
ings  theme.  Data  collection  includes  improving  and  updating  existing  build¬ 
ing  objects  (geometry  and  thematic  information)  and  capturing  new  features 
related  to  buildings  and  associated  thematic  information  such  as  entrances.  The 
implementation  of  the  vector  data  protocol  for  this  application  will  ensure  that 
helpful  advice  and  guidance  is  provided  to  all  users  in  an  attempt  to  maintain 
and  ensure  good  quality  as  citizens  are  contributing  changes  and  new  content. 
The  protocol  provides  guidance  on  building  scale,  building  shape,  logical  con¬ 
sistency  of  building  polygon,  geometric  consistency  of  entry  points  to  build¬ 
ings,  thematic  quality  and  the  provision  of  metadata.  Crucially,  the  use  of  a 
protocol  here  will  allow  the  NMA  to  outline  guidance  on  these  issues  so  that 
high-quality  data  can  be  captured.  The  workflow  of  the  protocol  (initialisation, 
data  collection,  self-assessment/quality  control,  data  submission  and  feedback 
to  the  community)  provides  more  structure  to  the  contribution  process  for  all 
users  regardless  of  their  background  skills  or  technical  abilities. 

The  second  example,  an  example  of  implementing  the  protocol  for  the  col¬ 
lection  of  geotagged  photographs  for  LULC  mapping,  involved  the  hypotheti¬ 
cal  development  of  a  customised  photograph-sharing  application  by  an  NMA. 
However,  it  could  also  be  beneficial  for  existing  photograph -sharing  sites  like 
Flickr  and  Panoramio  to  adopt  elements  of  the  proposed  data  collection  proto¬ 
col,  recording  and  providing  access  to  a  minimum  set  of  metadata.  First,  loca¬ 
tional  information  is  a  common  feature  of  modern  mobile  phones  and  some 
digital  cameras,  so  storing  and  providing  the  location  as  standard  information 
does  not  present  any  additional  burden  to  these  providers.  Moreover,  the  posi¬ 
tional  accuracy  of  handheld  devices  continues  to  increase,  and  there  are  early 
efforts  to  also  expand  this  increased  accuracy  to  indoor  positioning  (Mautz, 
2009;  Kuo  et  al.,  2014),  so  the  locational  quality  of  information  will  continue 
to  become  better  in  the  future.  Similarly,  it  could  be  beneficial  to  record  other 
elements,  such  as  camera  orientation,  tilt,  etc.  These  metadata  are  not  only  use¬ 
ful  for  geomatics  applications  but  are  also  of  interest  to  other  domains.  A  prime 
example  is  that  of  user-contributed  tags.  From  touristic  applications  (Majid 
et  al.,  2013)  to  early  response  systems  (Maso  et  ah,  2011),  tags  are  considered  a 
semantically  rich  source  of  information  that  need  to  be  further  enhanced.  Also, 
the  photograph-sharing  repositories  themselves  can  gain  valuable  insights  from 
more  complete  and  rich  contributions,  since  these  can  be  analysed  to  improve 
the  repositories’  own  services  and  attract  more  participants. 
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The  recognition  of  the  need  for  protocols  to  guide  future  VGI  projects  is 
clearly  lacking.  Hence  this  chapter  has  attempted  to  provide  a  generic  set  of 
guidelines  that  can  help  VGI  projects  consider  what  elements  are  necessary 
to  ensure  that  a  minimum  data  standard  is  reached  while  still  motivating  and 
sustaining  participation.  Within  this  broader  project  protocol,  a  protocol  for 
data  collection  is  needed,  where  we  would  argue  that  technology  should  be 
used  to  seamlessly  integrate  components  of  the  protocol  as  much  as  possible, 
thereby  reducing  the  burden  of  compliance  by  contributors.  This  work  pro¬ 
vides  fruitful  ground  for  future  research.  The  proposed  protocol  was  conceived 
in  a  sufficiently  general  way  so  that  it  can  be  potentially  applied  to  any  VGI 
project.  Based  on  the  multiple  recommendations  and  suggestions  provided  in 
this  chapter,  we  feel  that  detailed,  customised  versions  of  the  protocol  can  now 
be  created  and  applied  easily  to  specific  VGI  initiatives,  and  that  future  VGI 
projects  would  benefit  greatly  from  adhering  to  the  protocol  when  designing 
the  data  collection  process.  Applying  the  protocol  to  existing  or  future  projects 
would  also  serve  as  a  way  to  determine  the  value  of  the  protocol  itself  and  to 
suggest  possible  improvements.  Finally,  exploiting  the  protocol  to  revise  the 
way  in  which  VGI  is  collected  in  a  project  would  allow  for  the  comparison  of 
the  quality  of  data  produced  before  and  after  the  protocol’s  introduction  and 
therefore  to  help  assess  its  effectiveness. 


Notes 


1  https://hotosm.org 

2  http://www.openstreetmap.org 

3  http://wiki.openstreetmap.org/wiki/Map_Features 

4  http://navigator.er.usgs.gov/help/vgistructures_userguide.html 

5  https://www.geohistoricaldata.org 

6  http://confluence.org 

7  http://www.fotoquest-europe.com 

8  http://ec.europa.eu/eurostat/web/lucas/overview 

9  http://wiki.openstreetmap.org/wiki/Stats 
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Abstract 

The  rapid  expansion  of  citizen  science  projects  and  crowdsourcing  applications 
is  yielding  a  huge  and  varied  pool  of  Volunteered  Geographic  Information 
(VGI)  on  a  wide  variety  of  themes.  This  VGI  may  be  of  huge  value  for  institu¬ 
tions,  individuals  and  decision-makers,  but  only  if  it  can  be  discovered,  evalu¬ 
ated  for  quality  and  fitness-for-purpose  and  combined  with  data  from  other 
sources.  If  VGI  data  are  to  be  discovered,  used  and  reused  to  their  full  potential, 
they  must  be  actively  managed.  In  this  chapter  we  assess  the  current  state  of 
the  art  regarding  data  management  practices  in  VGI,  identify  some  challenges, 
obstacles  and  best-practice  examples,  and  review  a  range  of  developing  and 
established  open  source  technologies  which  can  underpin  robust  and  sustaina¬ 
ble  data  management  for  VGI.  We  conclude  that  VGI  is  likely  to  remain  patchy 
and  heterogeneous  and  that  existing  standards  may  not  be  exploited  to  their 
full  potential.  Nevertheless,  automated  support  for  documenting  the  genera¬ 
tion  and  use  of  VGI,  as  well  as  annotations  following  the  Linked  Data  para¬ 
digm,  can  help  to  improve  interoperability  and  reuse.  We  were  able  to  iden¬ 
tify  good  practices  within  different  existing  systems,  but  more  research  and 
development  work  is  needed  in  order  to  support  their  joint  application  for  the 
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benefit  of  VGI.  New  data  management  methodologies  can  only  succeed  if  their 
benefits  (for  example,  simplifying  administration  or  lowering  the  entry  barrier 
to  data  publication)  exceed  the  implementation  costs. 


Keywords 

Data  Management;  Quality  Assurance;  Quality  Control;  Interoperability;  Open 
Standards 


1  Introduction 

The  visibility  and  perceived  importance  of  VGI  projects  and  citizen  science  is 
continuously  increasing,  and  this  book  offers  insight  into  many  aspects  of  user¬ 
generated  content  and  VGI  collections.  In  this  chapter,  we  summarise  some 
insights  on  good  practice  for  the  storage  and  dissemination  of  this  type  of  data. 

Data  collection  and  information  retrieval  in  crowdsourcing  or  VGI  projects 
may  happen  on  very  different  spatial  and  temporal  scales  and  diverse  thematic 
areas,  and  may  involve  very  varied  groups  of  contributors  in  terms  of  exper¬ 
tise  and  interests.  VGI  campaigns  can  include,  for  example,  short-term  emer¬ 
gency  response  projects  (e.g.  after  earthquakes  and  other  natural  disasters) 
that  exploit  volunteered  observations  along  with  repurposed  information  har¬ 
vested  from  social  media;  Citizens’  Observatories  such  as  those  funded  by  the 
European  Commission1,  which  have  structured  and  strategic  goals  to  foster 
‘. ..  general  public  engagement  in  scientific  research  activities  when  citizens 
actively  contribute  to  science  either  with  their  intellectual  effort  or  surround¬ 
ing  knowledge  or  with  their  tools  and  resources... ’(Socientize,  2013);  or  well 
established  infrastructures  and  frameworks  such  as  the  Global  Biodiversity 
Information  Facility  (GBIF),  which  has  collated  and  registered  decades-worth 
of  global  species  data. 

Inherently,  such  initiatives  have  quite  heterogeneous  requirements  for  data 
cataloguing,  access  to  data,  licensing  and  long-term  availability  of  data,  but 
they  do  (or  at  least  they  should)  share  some  general  ‘good  practice  principles’ 
of  data  management.  These  principles  include  aspects  such  as  how  to  securely 
store  data;  how  to  grant  access  and  to  whom;  how  to  document  data  so  they  can 
be  found  by  humans  or  machines  for  specific  purposes;  and  how  to  develop  a 
common  understanding  of  the  meaning  of  collected  information  so  that  data 
can  be  understood  and  used,  at  the  very  least  within  the  context  of  the  original 
project,  but  potentially  also  outside  that  domain. 

In  2014,  the  Joint  Research  Centre  (JRC;  the  EC’s  science  service)  in  Ispra, 
Italy,  conducted  a  ‘Citizen  Science  and  Smart  Cities  Summit’  and  summarised 
in  a  technical  report  (Craglia  and  Granell,  2014)  that  at  the  time  when  they 
wrote  ‘. . .  there  [was]  little  interoperability  and  reusability  of  [user-generated] 
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data,  apps,  and  services  developed  in  each  project.’  A  follow-up  survey  rein¬ 
forced  these  conclusions,  especially  in  relation  to  data  management  practices 
in  citizen  science  projects  (Schade  and  Tsinaraki,  2016).  Acknowledging  these 
observations,  this  chapter  summarises  good  practice  recommendations  in 
data/metadata  management  and  curation,  as  well  as  details  on  international 
standards  and  cross-community  interoperability  that  can  potentially  overcome 
the  identified  shortcomings.  Proper  application  of  these  principles  could  per¬ 
mit  seamless  integration  of  data  sources  from  different  domains  into  coherent 
information  that  can  be  reused  beyond  the  scope  of  the  original  problem  -  thus 
leveraging  user-contributed  content  ‘to  the  next  level’,  i.e.  making  the  data  dis¬ 
coverable,  easier  to  reuse  and  thus  even  more  valuable. 


2  Data  Management  Overview 

This  section  first  introduces  the  required  background  about  the  topic.  It  is  then 
devoted  to  some  of  the  most  central  aspects  of  data  management.  We  focus 
on  those  items  that  cut  across  all  types  of  data  and  data  sources,  and  highlight 
the  foundational  issues  that  should  be  addressed  in  data  management  and  the 
related  planning  processes. 


2.1  Background 

Data  appear  in  many  different  forms  and  originate  from  an  ever-increasing 
number  of  sources  -  and  VGI  is  no  exception.  VGI  has  huge  potential  to  enrich 
the  data  portfolios  of  the  public  sector  (e.g.  environmental  measurement  sta¬ 
tions,  earth  observing  satellites,  land  surveys  and  consultations)  and  of  the 
private/corporate  sectors  (e.g.  mobile  phone  data,  sensor  measurements  inside 
vehicles,  market  studies,  etc.).  However,  the  heterogeneous  nature  of  VGI  pre¬ 
sents  challenges  for  integrating  with  these  ‘traditional’  data  assets,  which  are 
generally  structured  according  to  the  application  domains  from  which  they 
arise,  and  formatted  according  to  industry  standards,  which  may  or  may  not 
be  open-source.  As  seen  from  the  concrete  examples  in  this  book,  VGI  can 
encompass  a  wide  range  of  measurement  and  observation  types,  including  GPS 
tracks,  digitised  vector  graphics,  occurrence  information,  tagged  photographs 
and  sound  recordings,  and  observations  of  individual  species  over  time. 

Each  of  these  datasets  is  generated/collected  for  an  intended  purpose  (i.e.,  to 
deliver  some  value  for  a  beneficiary),  and  is  dealt  with  in  a  particular  way.  In 
other  words,  it  is  ‘managed’  in  one  way  or  another  -  independently  of  the  avail¬ 
ability  of  any  form  of  data  management  plan.  The  approaches  by  which  data 
in  general,  and  VGI  in  particular,  are  managed  diverge  greatly,  and  are  highly 
dependent  on  the  context  of  generation  and  use.  For  example,  data  collected 
locally  in  a  field  trip  to  teach  a  small  group  of  students  about  digital  cartography 
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might  be  kept  on  an  SD  card,  be  copied  to  several  desktop  computers  at  the  uni¬ 
versity  and  be  deleted  as  soon  as  the  course  ends.  By  contrast,  worldwide  obser¬ 
vations  about  species  occurrences  might  be  fed  into  a  well  networked  structure 
in  order  to  contribute  to  a  global  collection  effort  which  will  curate  those  data 
for  generations  of  scientists  and  environmental  organisations. 

Although  it  might  be  debatable  whether  every  single  collected  dataset  should 
be  preserved  for  potential  future  use,  sharing  of  volunteer-generated  data  is 
a  part  of  the  unspoken  contract  with  the  original  contributors  that  underlies 
citizen  science,  and  can  be  crucial  in  maintaining  the  commitment  of  volun¬ 
teers.  Bearden  (2007)  records  how,  in  the  absence  of  feedback  on  their  mapping 
efforts,  volunteer  USGS  contributors  ‘. ..  would  become  alienated  when  they 
realized  that  their  meticulous  work  would  not  be  used  in  the  foreseeable  future 
. . .’.  In  a  broader  context,  if  data  are  likely  to  be  usable  for  science,  then,  follow¬ 
ing  recent  moves  towards  reproducibility,  they  must  be  made  reusable.  These 
requirements  for  repeatability,  transparency  and  independent  evaluation  inevi¬ 
tably  suggest  a  need  to  curate  and  preserve  data  collections.  With  the  growing 
availability  of  data  storage  and  data  sharing  capacities,  many  of  the  technical 
needs  are  well  addressed.  However,  organisational  peculiarities  and  the  differ¬ 
ences  between  communities  of  practice  mean  that,  in  reality,  multiple  different 
approaches  can  be  applied.  While  some  thematic  areas  and  communities  have 
well  established  and  internally  consistent  approaches  to  data  handling  and  shar¬ 
ing,  those  experiences  and  practices  are  rarely  exchanged  widely  across  par¬ 
ties  with  different  interests.  To  give  an  example:  the  geospatial  community  (or, 
more  strictly  speaking,  the  spatial  data  infrastructure  (SD1)  community),  has 
developed  in-depth  knowledge  and  best-practice  recommendations  on  manag¬ 
ing  geographic  and  other  spatial  information  using  web  services  -  especially 
under  the  ISO  Technical  Committee  on  Geographic  Information/Geomatics 
(ISO/TC211)  and  the  Open  Geospatial  Consortium  (OGC).  However,  inter¬ 
connections  with  the  biodiversity  and  nature  conservation  community  have 
until  recently  been  limited  to  a  few  dedicated  projects,  including,  for  example, 
EU  BON2  and  COBWEB3.  However,  as  citizen  science  moves  into  a  new  era  of 
data  aggregation  and  harmonisation,  this  situation  is  changing  fast,  making  a 
discussion  of  data  management  practices  especially  topical  in  the  domain  of 
VGI.  We  will  re-visit  some  of  the  SDI  community  standards  below,  in  order  to 
indicate  reuse  potentials. 

While  each  individual  collection  of  VGI  is  valuable  to  preserve  per  se,  VGI 
also  has  reuse  potential  for  purposes  that  might  not  have  been  initially  fore¬ 
seen.  These  purposes  might  include  longitudinal  studies  on  the  use  and  evolv¬ 
ing  concept  of  VGI  itself,  but  could  also  involve  integration  with  other  data 
sources  and  interconnection  with  previously  unknown  data  flows  and  systems. 
It  is  therefore  an  emerging  practice  to  follow  common  standards  and  sup¬ 
port  interoperability,  in  order  to  avoid  introducing  artificial  barriers  to  such 
novel  and  unforeseen  usages  of  VGI.  The  Group  on  Earth  Observation  (GEO) 
recently  published  just  such  a  set  of  data  management  principles  for  the  Global 
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Earth  Observation  System  of  Systems  (GEOSS)4.  Simultaneously,  and  along  the 
same  lines,  the  Belmont  Forum  -  a  group  of  the  world’s  major  and  emerging 
funders  of  global  environmental  change  research  -  released  their  data  princi¬ 
ples5.  The  latter  principles  focus  on  Findability,  Accessibility,  Interoperability 
and  Reuse  (FAIR)  and  will  be  used  as  a  lens  through  which  to  assess  the  state 
of  the  art  in  Section  2. 


2.2  Organising  Data 

One  of  the  very  first  challenges  is  the  organisation  of  the  data  themselves.  Before 
even  considering  the  concrete  storage  format  and  structure  used,  it  has  to  be 
decided  at  some  point  which  items  are  considered  data  in  an  ‘atomic’  form,  and 
how  these  items  might  be  packaged.  As  we  will  see  later  in  the  chapter,  these 
early  decisions  will  impact  other  areas,  such  as  the  provision  of  (persistent) 
identifiers  or  the  granularity  of  metadata  (data  about  data).  In  the  context  of 
airborne  imagery,  the  decision  could  be  whether  to  make  accessible  as  one  unit 
a  whole  series  of  images  from  airborne  imagery  gathered  in  a  single  flight  or 
whether  to  treat  each  single  scene  (image)  as  a  single  dataset.  Analogously,  a 
species  observation  could  be  put  into  a  collection  that  unites  all  data  relating  to 
a  particular  day,  person,  sensor  type  (e.g.  smartphone),  administrative  region, 
area  of  interest  (e.g.  a  natural  park),  field  campaign,  etc.  The  particular  choice 
of  grouping  will  depend  on  the  intended  use,  which  in  turn  will  define  the  dis¬ 
covery  and  access  needs. 


2.3  Persistent  Identifiers 

Data  can  only  be  unambiguously  recognised  -  especially  when  they  are  shared 
with  other  people  -  if  they  can  be  uniquely  and  persistently  identified.  In  other 
words,  the  data  need  to  be  branded  in  some  way  that  does  not  change  over 
time.  If  the  data  are  to  be  accessible,  it  must  also  be  possible  to  resolve  that 
persistent  and  unique  identifier  into  an  appropriate  data  request. 

Without  going  into  too  much  detail  about  the  meaning  of  uniqueness  and 
identity,  it  obviously  makes  a  difference  whether  a  persistent  and  unique  identi¬ 
fier  is  assigned  to  every  ‘atomic’  data  item  or  to  collections  that  apply  any  of  the 
criteria  listed  above. 

The  meaning  of  persistency  also  has  to  be  challenged:  which  authorities  can 
guarantee  the  persistency  and  uniqueness  of  identifiers?  What  if  identifiers 
contain  the  names  of  institutions  or  groups  that  disappear  in  real  life?  Who 
can  guarantee  a  service  that  resolves  certain  identifiers  in  order  to  retrieve  the 
actual  dataset?  Furthermore,  it  has  to  be  noted  that  in  cases  where  unique  and 
persistent  identifiers  are  allocated  to  a  data  stream,  for  example  one  generated 
by  a  person  or  a  sensor,  the  retrieved  data  will  change  over  time.  In  practice,  the 
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identifier  could  resolve  to  the  latest  data  item  that  has  been  collected,  or  to  an 
accumulated  collection.  Some  specific  mechanisms  for  minting  and  managing 
persistent  identifiers  are  detailed  and  described  in  Section  3. 


2.4  Data  Documentation 

Are  we  able  to  use  a  dataset  that  we  created  ourselves?  Can  we  use  it  again  a  few 
years  after  we  collected  it?  How  are  others  supposed  to  find  that  dataset,  under¬ 
stand  what  it  really  encapsulates  (and  assess  if  it  might  be  valuable  for  their  work), 
access  it  and  provide  their  experiences  and  impressions  about  it?  The  answer  to 
all  of  these  questions  lies  in  metadata,  or,  in  other  words,  the  appropriate  docu¬ 
mentation  of  data  -  an  answer  which  is  more  easily  given  than  implemented. 

Documentation  is  required  for  a  wide  range  of  purposes  (e.g.  discovery,  eval¬ 
uation  and  use),  and  therefore  possible  forms  of  documentation  vary  greatly. 
Here,  again,  the  packaging  of  VGI  is  one  determining  factor,  since  one  might 
document  a  range  of  possible  ‘entities’,  for  example:  a  single  observation;  obser¬ 
vations  from  one  person  (including  also  a  description  of  that  person);  and  VGI 
collected  for  a  particular  area  (including  also  documentation  about  the  area). 
A  dataset  stored  as  a  collection  of  individual  observations  or  measurements 
might  include  information  about  the  accuracy  of  each  single  value;  it  has  to  be 
determined  how  this  accuracy  information  is  then  propagated  to  a  collection 
of  measurements  in  order  to  achieve  an  overall  quality  measure  for  the  dataset. 
If  a  user  is  filtering  this  dataset  for  potential  use  in  an  analysis  and  their  fitness- 
for-purpose  criteria  include  accuracy,  then,  in  theory,  this  aggregate  measure 
of  quality  should  be  recalculated  for  each  candidate  set  of  observations  -  a  con¬ 
siderable  challenge  for  the  architecture  within  which  the  data  are  being  curated 
and  made  accessible  for  discovery.  To  give  another  example,  in  a  VGI  data¬ 
set  where  observations  can  be  attributed  to  an  individual,  the  documentation 
might  include  the  reputation  of  this  individual  in  the  context  of  a  particular 
activity  or  community;  but  how  should  such  values  be  propagated  when  talking 
about  a  group  of  people?  At  the  time  of  writing,  accessible  and  robust  tools  for 
this  type  of  aggregation  are  lacking. 

Another  important  feature  of  documentation  is  the  semantics  used  to 
describe  what  is  actually  being  measured.  Terms  and  units  that  are  implicit  in 
one  domain  are  often  taken  for  granted,  and  not  necessarily  well  recorded  for 
communication  with  potential  users  in  other  fields.  For  example,  the  choice  of 
code  list,  (i.e.  determined  terminologies  of  a  particular  community)  to  con¬ 
strain  keywords  about  a  data  collection  might  hinder  others  in  finding  the  data 
collection  because  they  use  other  words  to  say  the  same  thing,  or  might  confuse 
people  expecting  something  completely  different  because  they  use  the  same 
word  to  say  something  else.  Only  where  semantic  mappings  between  code  lists 
are  available  can  these  cross-domain  discoveries  be  made  possible  and  reliable. 

Such  cross-walking’  initiatives  are  very  valuable,  because,  by  contrast  to 
free  text,  which  is  complicated  and  laborious  to  parse  and  mine,  code  lists  and 
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restricted  vocabularies  are  extremely  valuable  ways  to  speed  up  the  filtering  and 
fitness-for-purpose  assessment  of  datasets.  Natural  language  processing  is  pow¬ 
erful  and  becoming  more  so,  as  can  be  seen  from  the  increasing  support  for 
automated  systems  such  as  chatbots.  However,  these  systems  model  primarily 
social  contexts,  and  are  not  yet  coupled  to  the  kind  of  semantic  matching  and 
inference  that  are  needed  to  distinguish  the  correct  context  in  which  a  word 
is  being  used  to  describe  an  indicator,  unit  of  measure  or  phenomenon  across 
different  scientific  fields.  For  example,  if  a  user  is  searching  globally  for  data¬ 
sets  that  include  numerical  estimates  of  uncertainty  or  variability,  they  could 
search  for  free  text  descriptions  that  include  terms  such  as  ‘variance’,  ‘standard 
deviation,  ‘ecart-type’  or  ‘intervalo  de  confianza’.  However,  the  presence  of  such 
words  does  not  guarantee  that  variability  is  indeed  mathematically  described 
within  the  dataset,  since,  for  example,  the  word  ‘variance’  can  also  be  used  in  a 
qualitative  sense.  By  contrast,  a  URI6  identifies,  via  the  vocabulary  server  of  the 
UK’s  National  Environmental  Research  Council,  a  definition  of ‘variance’  that  is 
explicitly  mathematical  and  that  can  be  related  to  other  defined  statistical  con¬ 
cepts,  across  spoken  languages  and  scientific  domains.  A  similar  clarification  of 
terms  such  as  ‘sea  level’  can  be  seen  at  the  SeaDataNet  vocabulary  server7. 

For  this  reason,  many  classic  metadata  elements  allow  free  text  only  for  titles 
and  descriptions  but  require  selection  from  code  lists  for  everything  else.  We 
will  consider  some  examples  of  this  practice  below,  in  the  section  relating  to 
standards.  However,  there  are  times  when  there  is  no  substitute  for  human- 
readable  material  such  as  manuals  and  descriptions  of  research  methods,  and 
so  methods  for  adding  or  linking  these  to  VGI  datasets  as  annotations  must  be 
considered.  Such  documentation  can  encourage  the  dissemination  of  a  data¬ 
set  and  might  raise  the  reputation  of  those  who  created  it  -  see,  for  example, 
the  first  publication  within  the  newly  established  geospatial  dataset  description 
section  of  the  International  Journal  of  Spatial  Data  Infrastructures  Research  8, 
or  the  recently  launched  Data  in  Brief  journal9.  Such  documents  can  convey 
organisational  priorities  that  are  hard  to  capture  otherwise:  they  can  help  oth¬ 
ers  to  understand  the  deeper  intentions  behind  why  a  dataset  has  been  col¬ 
lected,  and  the  reasons  for  organisational  decisions,  thereby  contributing  to 
the  understanding  of  the  overall  purpose  and  potential  reusability  of  a  dataset. 

Last  but  not  least,  it  should  be  considered  whether  feedback  can  be  collected 
on  the  dataset  (at  whatever  level  of  granularity  the  packaging  allows).  Such 
feedback  might  include  ratings,  written  statements  and  references  to  cases  of 
reuse,  but  also  more  direct  indications  of  potential  error,  identified  needs  for 
updating,  etc. 


2.5  Sharing-  With  Whom ? 

The  management  and  curation  of  datasets  not  only  is  an  exercise  for  those 
gathering  and  hosting  data,  but  also  benefits  the  users,  whether  those  are 
the  originally-intended  beneficiaries  or  new  user  groups  that  find  value 
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in  reusing  a  dataset  for  their  own  purposes.  Access  and  use  conditions 
may  vary  -  e.g.  depending  on  privacy  and  legal  issues  (see  also  Chapter  6, 
Mooney  and  Minghini,  2017  on  privacy,  legal  issues  and  ethics),  commer¬ 
cial  interests,  or  an  organisation’s  commitment  to  Open  Science.  However, 
VGI  can  only  be  exploited  to  its  full  potential  if  these  conditions  are  clearly 
articulated  and,  ideally,  accompanied  by  the  relevant  licences.  The  decision 
to  integrate  or  split  VGI  into  collections  will  have  an  impact  here,  since  per¬ 
missions  on  different  elements  of  a  VGI  dataset  could  be  different,  meaning 
that  different  consumers  would  access  different  collections  of  records. 

Having  persistent  identifiers  and  a  minimum  set  of  documentation  (including 
contributors,  title  and  release  date)  in  place  also  enables  proper  data  citation  -  an 
element  that  should  not  be  underestimated.  On  the  one  hand,  citable  VGI  allows 
clear  reuse,  since  reference  can  now  be  made  not  only  to  other  scientific  articles, 
but  also  unambiguously  to  data  used  within  a  particular  activity.  On  the  other 
hand,  data  citation  also  provides  a  means  of  acknowledging  the  source  -  thereby 
contributing  to  the  recognition  of  the  data  contributors  and  owners  and  provid¬ 
ing  an  incentive  for  the  provision  of  metadata  and  curation  of  VGI.  It  is  likely 
that  new  metrics  for  scientific  reputation  (altmetrics)  will  very  soon  take  these 
achievements  into  account;  the  cross-referencing  of  datasets  and  the  numbers  of 
citations  will  become  essential  measures  of  impact. 


3  The  Role  of  Open  Standards  for  VGI  Data  Management 

In  the  above  discussion  we  have  identified  a  number  of  crucial  practices  for 
ensuring  the  usability  and  usefulness  of  VGI  data.  A  number  of  tools  and  pro¬ 
tocols  exist  which  can  support  these  practices,  and  key  among  these  are  the  var¬ 
ious  open  standards  which  allow  data  to  be  described,  structured,  exchanged, 
discovered  and  documented  in  ways  which  best  promote  interoperability  and 
reuse.  In  this  context,  we  use  the  word  ‘standards’  not  to  denote  quality  stand¬ 
ards,  which  are  addressed  in  Chapter  7,  but  agreed  schemas,  formats  and  pro¬ 
tocols  from  bodies  such  as  the  World  Wide  Web  Consortium  (W3C)10  and 
OGC11,  which,  by  virtue  of  being  open  for  free  use,  are  accessible  to  a  wide 
range  of  users  across  scientific  and  other  domains. 

In  the  following  section,  the  FAIR  principles  will  be  used  to  structure  dis¬ 
cussion  of  the  tools  and  approaches  that  are  available.  This  minimum  set  of 
foundational  principles  originally  derives  from  a  2014  workshop  that  brought 
together  a  wide  range  of  ‘academic  and  private  stakeholders  all  of  whom  had 
an  interest  in  overcoming  data  discovery  and  reuse  obstacles’.  The  principles 
have  been  subsequently  developed  and  refined  with  the  goal  of  ensuring  that 
‘research  objects  should  be  Findable,  Accessible,  Interoperable  and  Reusable 
(FAIR)  both  for  machines  and  for  people’  -  allowing  stakeholders  to  ‘more  eas¬ 
ily  discover,  access,  appropriately  integrate  and  re-use,  and  adequately  cite,  the 
vast  quantities  of  information  being  generated  by  contemporary  data-intensive 
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science’  (Wilkinson  et  al.,  2016).  FAIR  is  intended  to  be  domain-independent 
and  to  be  applicable  to  data  archival,  management,  exploration,  discovery  and 
reuse  across  a  range  of  research  fields  and  scholarly  disciplines. 

Examples  have  been  chosen  from  the  current  practice  of  the  Global  Biodiver¬ 
sity  Information  Facility  to  illustrate  certain  sections  of  FAIR.  The  reason  for 
this  choice  is  that  GBIF  is  an  extremely  good  example  of  cross-domain  strategic 
thinking  where  standards  from  different  fields  have  been  employed,  adapted, 
influenced  and  developed  in  order  to  generate  a  highly  usable,  scientifically 
robust  repository  of  data  from  hugely  varying  sources  that  supports  hundreds 
of  high-quality  peer-reviewed  scientific  analyses  each  year12. 

The  FAIR  principles  are  as  follows: 

FI.  (meta)data  are  assigned  a  globally  unique  and  persistent  identifier 
F3.  metadata  clearly  and  explicitly  include  the  identifier  of  the  data  it 
describes 

As  described  above,  data  can  only  be  sensibly  shared  and  reused  if  the  data 
resource  can  be  identified  and  reliably  retrieved.  Persistent  identifiers  are  unique 
strings  of  numbers  and/or  characters  that  are  assigned  to  a  digital  resource  (e.g. 
datasets,  documents,  images)  in  order  to  allow  long-term,  reliable  access  to  that 
specific  item.  Persistent  identifiers  should  ideally  be  managed  separately  from 
the  physical  location  of  the  resource,  ensuring  the  continued  accessibility  and 
discoverability  of  the  resource  no  matter  how  many  times  the  object  moves  to 
different  servers  or  property  rights  owners’  (USGS,  2017).  Actionable  persis¬ 
tent  identifiers  permit  access  to  the  resource  via  a  link,  which  should  remain 
resolvable  for  the  long  term.  An  example  that  is  widely  used  in  the  scientific 
domain  is  the  Digital  Object  Identifier  (DOI;  ISO  standard  26324:2012)13, 
which  allows  published  documents  and  datasets  to  be  tracked  and  cited,  and 
which  is  assigned  to  journal  publications  (or  prepublications)  by  CrossReP4, 
Figshare15,  Zenodo16  and  other  platforms.  Recent  moves  towards  data  DOIs 
have  been  hugely  supported  by  initiatives  such  as  DataCite17,  NOAA’s  EZID18, 
or  DryadLab19,  which  enable  a  data  producer  to  mint  a  DOI  and,  in  some  cases, 
register  associated  metadata. 

An  example  current  practice  for  VGI  is  the  ability  of  the  GBIF  website  to 
produce  and  maintain  a  DataCite  DOI  for  a  specific  user  request,  guaranteeing 
that  this  request  can  be  reliably  repeated  at  a  future  date.  Different  query  filters 
(date,  type  of  record,  species’  scientific  name,  country,  etc.)  are  collated  and 
stamped  with  a  DOI,  which  is  supplied  to  the  user  to  ensure  future  retrieval  of 
records  according  to  the  same  filters. 

A  DOI  can  be  allocated  at  a  level  of  granularity  specified  by  the  user,  but  the 
maintenance  of  relationships  (e.g.  hierarchical  ‘nestings’  of  DOIs)  is  the  respon¬ 
sibility  of  the  resource  owner,  and  can  be  challenging.  The  ability  to  discover 
related  datasets  in  this  way  is  extremely  powerful,  and  can  support  the  Linked 
Data  approach  described  more  fully  in  the  next  section.  Attention  to  versioning 
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is  also  important:  a  DOI  may  represent  the  final  version  of  a  resource,  approved 
for  release;  an  extension  or  annotation  of  a  resource;  or  a  model/algorithm 
version  used  in  a  reproducible  workflow  (in  this  context,  a  github  or  subversion 
version  ID  can  be  adapted  to  fulfil  at  least  some  of  the  role  of  a  DOI).  However, 
there  are  cases  where  a  DOI  will  always  return  ‘the  latest  version  of  a  resource, 
and,  here,  scientific  reproducibility  is  not  guaranteed.  GBIF  DOIs  are  a  good 
example:  the  data  underlying  a  query  are  regularly  improved  and  updated,  and 
historical  records  maybe  retrospectively  added,  meaning  that  the  exact  same  set 
of  records  is  not  guaranteed  to  be  returned  when  a  DOI  is  used  at  a  later  date. 

It  is  possible  to  embed  dataset  identifiers  within  metadata  using  existing  geo¬ 
spatial  metadata  standards,  such  as  ISO  191 1520,  which  offers  a  CI_Citation 
element  that  allows  an  identifier  such  as  a  DOI  to  be  supplied  in  a  structured 
manner  and  to  be  associated  with  a  namespace  that  can  help  to  ensure  the 
uniqueness  of  the  identifier.  However,  the  real-world  practice  is  less  consistent, 
as  evidenced  when  exploring  records  in  the  GEOSS  Common  Infrastructure 
(GCI):  here,  metadata  and  data  identifiers  are  found  in  a  wide  variety  of  loca¬ 
tions  within  catalogued  metadata  documents,  and  are  sometimes  completely 
absent.  This  problem  is  more  cultural  than  technical:  because  ISO  19115:2003 
is  not  completely  clear  about  the  difference  between  data  and  metadata  identi¬ 
fiers,  and  lacks  a  clear  recommendation  on  the  use  of  Unique  and  Universal 
Identifiers  (UUIDs),  profilers  have  generated  a  variety  of  different  identifiers  (if 
they  have  generated  them  at  all  in  the  first  place)  and  have  located  these  iden¬ 
tifiers  in  at  least  four  different  locations  within  metadata  documents  (Maso, 
2013).  The  US  FGDC  metadata  standard  also  allows  the  encoding  of  a  vari¬ 
ety  of  references  to  data  and  metadata21,  but  also  requires  some  investment  of 
time  and  effort  for  proper  use.  In  the  next  section  we  discuss  the  implications 
of  these  standards’  complexity  for  VGI  initiatives  that  may  be  ephemeral  and 
poorly  resourced. 

13.  (meta)data  include  qualified  references  to  other  (meta)data 

R1.2.  (meta)data  are  associated  with  detailed  provenance 

In  the  above  section,  we  described  potential  ways  in  which  the  identifier  of  a 
dataset  can  be  embedded  in  a  traditional  geospatial  metadata  document.  How¬ 
ever,  an  important  consideration  in  the  context  of  VGI  is  the  rather  complex 
and  laborious  nature  of  generating  such  ‘traditional’  metadata  documents, 
which  require  a  significant  investment  of  time  and  effort.  Geospatial  metadata 
standards  such  as  ISO  19115/19157  and  FGDC  offer  a  rich  and  expressive  range 
of  descriptive  elements,  but  the  reality  is  that  many  VGI  initiatives  are  unlikely 
to  generate  such  detailed  documentation.  In  the  face  of  this  reality,  other,  more 
lightweight  alternatives  are  likely  to  be  taken  up. 

In  those  cases  where  metadata  that  are  compliant  with  the  ISO  standard 
are  generated,  there  is  a  huge  opportunity  for  documenting  provenance  in  a 
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machine-readable  way  that  can,  if  necessary,  encode  a  full  production  work- 
flow.  The  Lineage  element  of  an  ISO  document,  stored  as  part  of  the  data  qual¬ 
ity  statement,  permits  the  description  of  any  number  of  processing  steps,  com¬ 
plete  with  references  to  input  and  output  data,  descriptions  of  algorithms  of 
software  processing  and  citations  of  published  reports/articles22.  Figure  1  shows 
a  single  ProcessStep  taken  from  such  a  lineage  statement,  rendered  in  a  more 
human-readable  format,  ft  consists  of  a  description  of  the  processing  that  was 
carried  out,  and  the  three  data  sources  (all  of  which  may  be  optionally  identi¬ 
fied  with  persistent  identifiers)  that  were  used  in  the  processing. 

The  standard  and  schema  implementations  of  ISO  19115/19157  allow  for  a 
series  of  such  ProcessSteps  to  be  combined  to  generate  a  highly  detailed,  and,  to 
some  extent,  machine-readable  description  of  a  dataset’s  provenance.  However, 
in  practice,  the  rich  array  of  available  elements  are  rarely  used  as  intended,  and 
it  is  far  more  common,  if  a  lineage  statement  is  provided  at  all,  to  see  a  single 
ProcessStep  with  a  long  and  descriptive  text  account  of  the  means  by  which  the 
data  were  produced.  This  is  in  part  because  of  the  basic  nature  of  many  edit¬ 
ing  tools  for  ISO  metadata  and  the  lack  of  best-practice  examples,  but  it  is  also 
evidence  of  the  investment  required  to  generate  detailed  metadata  compliant 
to  standards,  and  of  the  fact  that  this  investment  is  not  always  budgeted  into 
research  projects  -  especially  not  citizen  science  projects.  The  FGDC  approach 
to  documenting  data  provenance  is  simpler,  relying  primarily  on  citations  to 
scientific  papers  rather  than  on  a  fully  modular  description  of  the  processing, 
but  it  is  still  common  to  find  FGDC-compliant  metadata  with  no  real  informa¬ 
tion  on  data  provenance. 

An  alternative,  or  potentially  a  complement,  to  traditional  geospatial  meta¬ 
data  is  a  Linked  Data  approach  (Heath  and  Bizer,  2011).  Here,  triples  (in  the 
form  of  subject-predicate-object)  are  used  to  describe  relationships  between 
entities.  This  mechanism,  further  discussed  in  Section  4.3,  extends  the  potential 
for  resource  discovery  to  off-the-shelf  web  browsers,  rather  than  just  specialised 
portals  and  catalogues.  Such  an  encoding,  which  is,  in  effect,  returning  to  the 
roots  of  Geography  Markup  Language  (GML)  -  GML  version  1.0  came  with 
an  encoding  in  the  Resource  Description  Framework  (RDF)  -  can  be  adapted 
to  include  provenance  information  on  a  dataset.  This  strategy  is  of  particular 
interest  because  it  could  be  used  to  improve  or  enrich  data  documentation  after 
data  are  published,  or  when  they  are  reused  for  a  different  purpose  than  the 
original  intended  use  case.  For  example,  user  reviews,  reports  of  usage,  discov¬ 
ered  issues  relating  to  particular  observations,  spatial  regions  or  observers  could 
be  attached,  post-hoc,  to  a  published  dataset  and  used  in  filtering  and  assessing 
fitness-for-purpose.  Initial  research  along  these  lines  can  be  seen  in  the  outputs 
of  the  CHARMe  project23,  which  adapted  the  proposed  OGC  Geospatial  User 
Feedback  standard  (Maso  and  Bastin,  2015)  to  permit  lightweight  annotations 
to  be  added  to  climate  data  in  order  to  document  quality  issues,  anomalies  and 
user  opinions  on  the  value  of  the  data.  Another  promising  approach  is  the  use 
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ProcessStep 

description 

Discriminant  Analysis  [DA]  involves  a  linear  combination  of  the 
original  variables  to  produce  a  new  set  of  variables  that  maximise 
the  statistical  difference  between  the  predefined  groups.  DA  acts  as  a 
standard  classifier  (applied  to  each  date)  because  it  enables  an 
unknown  pixel  to  be  assigned  to  one  of  the  predefined  classes  using 
discriminant  functions  obtained  from  a  set  of  training  areas. 

Training  areas  were  obtained  from  fieldwork  carried  out  in  the  Ebro 
Delta  on  29  October  2006  and  17  January  2007.  The  surface  of  the 
training  areas  collected  during  fieldwork  was  79.7  ha  and  40%  of 
them  were  reserved  for  an  independent  test  of  the  results  (random 
sampling). 

source 

description 

Training  areas  collection  (47.8  ha):  Several  sites  representative  of 
each  class  were  visited  and  georeferenced  with  the  aid  of  a  Global 
Positioning  System  (Garmin  etreX  VistaC,  Garmin  International, 
Olathe,  KS,  USA),  the  cadastre  cartography  and  the  most  recently 
available  Landsat  image. 

source 

description 

Test  areas  collection  (31.9  ha):  Several  sites  representative  of 
each  class  were  visited  and  georeferenced  with  the  aid  of  a  Global 
Positioning  System  (Garmin  etreX  VistaC,  Garmin  International, 
Olathe,  KS,  USA),  the  cadastre  cartography  and  the  most  recently 
available  Landsat  image. 

source 

description 

Each  of  the  5  previous  Landsat-5  images  after  geometric  and 
radiometric  correction  and  SIGPAC  masking 

Fig.  1  :  The  content  of  a  ProcessStep  in  an  ISO  19115  metadata  document. 
Namespaces  and  XML-specific  formatting  have  been  removed  for  clarity. 


of  the  W3C  PROV  specifications  in  combination  with  RDF  triples  to  create  que- 
ryable  databases  representing  the  steps  by  which  a  dataset  has  been  generated. 
A  particular  advantage  of  this  approach  is  its  amenability  to  extension  when 
products  are  derived  by  some  process  which  needs  to  be  documented.  In  par¬ 
ticular,  the  documentation  of  uncertainty  introduced  by  data  processing  has 
been  explored  by  Car  et  al.  (2015),  who  combined  UncertML  (Williams  et  al., 
2009)  -  a  model  and  schema  for  documenting  probabilistic  uncertainty  -  with 
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the  PROV-O  provenance  ontology  in  such  a  way  that  quality  issues  in  multi-part 
datasets  can  be  encoded,  and  automated  uncertainty  propagation  is  made  much 
more  feasible. 

F4.  (meta)data  are  registered  or  indexed  in  a  searchable  resource 
A2.  metadata  are  accessible,  even  when  the  data  are  no  longer  available 

The  geospatial  community  has  widely  adopted  the  use  of  catalogues,  which  can 
be  harvested,  aggregated  and  searched  in  order  to  yield  metadata  that  in  turn 
reference  the  location  of  data  resources.  In  many  cases,  the  data  referenced  in 
these  metadata  documents  are  no  longer  available  at  the  specified  locations  - 
though  this  is  usually  an  accidental  result  of  poor  curation,  rather  than  a  dem¬ 
onstration  of  conscious  compliance  with  principle  A2.  The  prevalent  standard 
underlying  geospatial  catalogues  is  the  OGC’s  Catalogue  Service  standard24, 
of  which  there  are  many  free  and  open-source  implementations,  including  the 
Java-based  GeoNetwork  and  the  Python  implementation  pycsw.  Acknowledg¬ 
ing  that  the  OGC  and  SDI  community  to  a  large  extent  complements  main¬ 
stream  Internet  developments  through  specific  additions  and  extensions,  the 
provision  of  metadata  in  the  form  of  indexing  files  for  common  Internet  search 
engines  should  also  be  considered. 

Al.  (meta)data  are  retrievable  by  their  identifier  using  a  standardized 
communications  protocol 

A  1.1  the  protocol  is  open,  free,  and  universally  implementable 
A  1.2  the  protocol  allows  for  an  authentication  and  authorization  proce¬ 
dure,  where  necessary 

As  described  above,  a  variety  of  free  and  open  standards  exist  for  the  search 
and  retrieval  of  metadata  from  catalogues  through  an  identifier.  In  terms  of 
data  service  protocols,  a  powerful  and  widely  adopted  set  of  standards  has 
been  agreed  to  and  maintained  by  the  OGC:  namely,  the  Web  Map  Service 
(for  images),  Web  Feature  Service  (for  data  about  geospatial  objects)  and  Web 
Coverage  Service  (for  data  about  geospatial  fields).  These  standards  are  widely 
used,  and  implemented  in  a  variety  of  languages  and  off-the-shelf  toolkits  such 
as  GeoServer,  MapServer,  THREDDS  and  GeoNode,  which  are  free  to  install 
and  require  relatively  little  configuration  effort  on  the  part  of  a  user.  When 
accessing  data  or  imagery  via  OGC  services,  a  simple  HTTP  request  is  param- 
eterised  with  various  user-specified  options  such  as  the  area  of  interest  and 
the  projection  in  which  the  data  should  be  returned.  However,  it  is  not  specifi¬ 
cally  the  identifier  of  the  data  that  is  used  to  identify  the  resource  of  interest; 
more  commonly,  one  or  more  URTs  are  embedded  in  the  metadata  document, 
incorporating  the  layer  name  and  namespace  and  enabling  the  retrieval  of 
the  resource  from  the  service  in  question,  which  may  not  incorporate  that 
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unique  identifier  at  all.  For  example,  a  typical  WFS  request  contains  a  param¬ 
eter  with  a  namespace  and  layername  defining  the  data  to  be  retrieved  (e.g. 
‘typeName=lrm:wdpa_latest’),  but  there  is  no  requirement  to  use  a  persistent 
identifier  for  the  layer  name. 

Authorisation  and  authentication  are  possible  with  some  implementations  of 
these  standards,  for  example  GeoServer25. 

11.  (meta)data  use  a  formal,  accessible,  shared,  and  broadly  applicable 
language  for  knowledge  representation 

12.  (meta)data  use  vocabularies  that  follow  FAIR  principles 
R1.3.  (meta)data  meet  domain- relevant  community  standards 
F2.  data  are  described  with  rich  metadata 

Rl.  meta(data)  are  richly  described  with  a  plurality  of  accurate  and 
relevant  attributes 

In  order  to  represent  the  knowledge  of  data  producers,  some  clear  and  well 
structured  approaches  have  been  developed.  These  identify  core  sets  of  vital 
information  which  must  be  provided,  and  supplement  these  cores  with 
optional  descriptive  elements  that  can  enrich  the  metadata  and  assist  in  assess¬ 
ment  of  fitness-for-purpose.  For  example,  both  ISO  and  FGDC  standards  have 
a  subset  of  compulsory  elements  without  which  the  metadata  are  invalid,  and  a 
wide  array  of  optional  descriptors  that  can  be  extremely  detailed  -  for  example, 
reports  on  quality,  representativity,  licensing  and  data  provenance.  Thus  these 
standards  support  the  generation  of  rich  and  informative  metadata.  In  order  to 
make  these  metadata  more  easily  machine-readable  and  avoid  large  amounts 
of  text  mining,  many  elements  can  be  populated  with  strings  selected  from 
code  lists,  which  map  to  defined  meanings  in  vocabularies  and  may  be  further 
maps  to  terms  in  other  vocabularies.  A  good  example  of  this  is  the  ‘occurrence 
issue’  vocabulary  used  by  GBIF  to  describe  potential  problems  with  a  record, 
ranging  from  swapped  coordinates  to  incorrectly  inferred  country  origin  for 
a  record.  Using  values  constrained  by  this  list,  extremely  detailed  information 
about  quality  assurance  can  be  recorded  in  a  very  systematic  way,  which  ena¬ 
bles  easy  filtering  and  querying  of  records  based  on  the  nature  of  their  errors, 
and  avoids  confusion  where  different  assessors  might  describe  an  issue  using 
different  technical  terms26. 

Similar  vocabularies  have  been  devised  for  ISO  standards27  and  for  taxo¬ 
nomic  terms  that  allow  the  FDGC  standard  to  be  extended  to  cover  biological 
data28.  This  last  point  is  another  strength  of  these  agreed  standards:  they  can 
be  profiled  to  produce  domain-relevant  standards,  while  core  elements  remain 
consistent  and  interoperable  with  metadata  produced  using  the  base  stand¬ 
ard.  In  the  context  of  GBIF,  the  Darwin  Core  standard,  which  is  fundamental 
for  structuring  and  harmonising  species  occurrence  data,  has  been  recently 
extended  with  new  elements  that  permit  the  representation  of  sample  data 
reporting  species  abundance  information29. 
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4  Representative  Examples  of  Cross-Community 
Interoperability  Approaches 

Following  the  considerations  so  far,  GBIF  has  already  been  considered  as 
a  good  example  to  learn  from.  In  addition  to  some  of  the  highlights  of  the 
underlying  approach,  we  see  additional  value  in  including  two  more  examples 
in  order  to  cover  a  wider  spectrum  of  existing  (or  emerging)  good  practices  in 
VGI  data  management. 


4.1  The  GBIF  Data  Publishing  Framework 

GBIF30  was  founded  in  2001  upon  a  recommendation  of  the  Biodiversity  Infor¬ 
matics  Subgroup  of  the  Megascience  Forum  and  a  subsequent  endorsement  by 
the  OECD  science  ministers,  to  ‘enable  users  to  navigate  and  put  to  use  vast 
quantities  of  biodiversity  information,  advancing  scientific  research  . . .  serving 
the  economic  and  quality-of-life  interests  of  society,  and  providing  a  basis  from 
which  our  knowledge  of  the  natural  world  can  grow  rapidly  and  in  a  manner 
that  avoids  duplication  of  effort  and  expenditure.’31 

Since  then,  GBIF  has  established  a  renowned  cross-community  data  and 
metadata  infrastructure  to  function  as  a  single  point  of  access  to  hundreds  of 
institutions  and  services  offering  biodiversity  data,  based  upon  a  data  publish¬ 
ing  framework  as  advised  by  the  GBIF  Data  Publishing  Framework  Task  Group 
with  the  central  recommendation  that  ‘all  data  relevant  to  the  understanding  of 
biodiversity  and  to  biodiversity  conservation  should  be  made  freely,  openly  and 
effectively  available’  (Moritz  et  al.,  2011).  GBIF  facilitates  responsible  use  and 
sharing  of  data  by  emphasising  the  need  for  proper  publishing  and  citation,  and 
by  citing  contributing  nodes  as  data  curators.  It  claims  to  offer  data  about  more 
than  1.6  million  species,  collected  in  300  years  of  exploration,  from  volunteers, 
researchers  and  monitoring  programmes  (see  the  organisation’s  ‘what  is  GBIF’ 
website  section32  and  the  GBIF  Data  Policy33). 

As  a  mature  and  open  infrastructure,  the  GBIF  architecture  supports  several 
standards,  the  most  important  ones  being  Darwin  Core,  Ecological  Metadata 
Language  (EML 34),  Access  to  Biological  Collections  Data  ( ABCD 35)  for  metadata 
and  also  access  protocols  like  TDWG  Access  Protocol  for  Information  Retrieval 
(TAPIR 36)  and  Distributed  Generic  Information  Retrieval  (DiGIR 37),  in  order 
to  register  and  connect  hundreds  of  different  data  holders  and  service  providers 
within  the  GBIF  portal.  Most  of  the  ‘biodiversity  standards’  are  being  developed 
in  the  context  of  the  Taxonomic  Databases  Working  Group  (TDWG)38. 

The  principal  workflow  within  the  GBIF  (2011)  infrastructure  is  described 
as  follows: 

1.  Digitization:  The  initial  capturing  of  information  in  electronic  form, 
through  imaging,  databasing,  maintaining  spreadsheets  etc. 
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2.  Publishing:  The  act  of  making  data  sources  available  in  a  well  known  for¬ 
mat  (standard)  and  with  appropriate  metadata  for  access  on  the  internet. 

3.  Integration:  The  process  of  aggregating  published  datasets,  applying  con¬ 
sistent  quality  control  routines  and  normalizing  formats. 

4.  Discovery  and  access:  By  building  network  wide  indexes,  discovery  ser¬ 
vices  are  offered  for  users  through  portals  and  for  machines  by  extensive 
web  service  APIs  (GBIF,  2011). 39 

In  order  to  collect  standardised  information  from  contributing  nodes,  GBIF 
offers  its  community  several  tools,  the  most  prominent  one  being  the  Inte¬ 
grated  Publishing  Toolkit  (IPT): 

The  IPT’s  two  primary  functions  are  to 

1)  encode  existing  species  occurrence  datasets  and  checklists,  such  as 
records  from  natural  history  collections  or  observations,  in  the  Darwin 
Core  standard  to  enhance  interoperability  of  data,  and 

2)  publish  and  archive  data  and  metadata  for  broad  use  in  a  Darwin  Core 
Archive,  a  set  of  files  following  a  standard  format  (Robertson  et  al., 
2014). 

A  further  functionality  is  the  possibility  to  convert  metadata  into  ‘data  papers’ 
that  may  be  published  as  peer-reviewed  scholarly  articles  in  a  journal.  This  is  a 
direct  incentive  for  publishing,  as  data  can  then  be  cited,  raising  the  profile  of 
the  researcher  or  institution40.  It  also  encourages  the  user  to  directly  choose  a 
public  domain  licence  for  the  data  (which  is  in  line  with  GBIF  s  data  policy  and 
also  leads  to  easier  reuse  of  the  data;  see  FAIR  principles  in  previous  section). 

The  Integrated  Publishing  Toolkit  is  one  prominent  example  of  how  GBIF 
tries  to  lower  the  barriers  for  new  data  publishers  and  to  promote  this  com¬ 
munity’s  standards. 


4.2  The  OGC  Interoperability  Program,  Cross  Community 
Interoperability 

VGI  data  often  lack  a  common  understanding  associated  to  the  meaning  of 
the  data  or  are  user- contributed  without  any  specific  purpose,  via  social  media 
platforms  such  as  Twitter  and  Flickr.  Nonetheless,  often  these  data  contain  geo¬ 
graphic  reference  and  are  tagged  with  other  useful  and  queryable  information, 
and  the  social  media  platforms  offer  application  programming  interfaces  (APIs) 
to  harvest  from  their  services.  In  photo-community  platforms,  for  example,  the 
position  of  the  published  image  may  be  (sometimes  unintentionally)  recorded  in 
the  GPS  tags  of  EXIF  metadata.  This  is  likely  to  increase  with  the  widespread  use 
of  smartphones  equipped  with  capable  GPS  sensors.  These  sensors  may  even¬ 
tually  provide  even  more  sophisticated  information  -  for  example,  orientation 
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and  tilt  angle  of  the  camera.  Such  ancillary  information  is  useful  in  a  wide  vari¬ 
ety  of  use  cases:  for  example  as  additional  ground  truth  data’  in  the  validation 
of  global  land  cover  products,  or  as  one  source  among  others  in  realtime  cri¬ 
sis  management.  Several  authors  (Goodchild,  2007;  fiirrens  et  al.,  2009;  Schade 
et  al,  2011)  have  suggested  viewing  citizens  [or  humans]  as  sensors  and  using 
the  OGC  Sensor  Web  Enablement  (SWE)  as  a  reference  framework  to  describe 
these  sensors  and  their  readings  (or  observations).  In  short,  this  framework  aims 
at  making  sensor  readings  of  all  kinds  discoverable  and  accessible  via  the  net  as 
near  real-time  streams  in  a  standardised  way,  thus  allowing  for  e.g.  additional 
information  streams  beyond  authoritative  data  from  satellite  images  (in  the  case 
of  crisis  response  for  example).  The  SWE  consists  of  a  set  of  relevant  standards, 
for  example: 

•  O&M  -  Observations  and  Measurements:  This  standard  describes  the  gen¬ 
eral  data  model  and  specifies  XML  encodings  on  how  to  represent  data. 

•  SOS  -  Sensor  Observation  Service:  The  standard  description  of  the  service 
offering  sensor  descriptions  and  their  observations. 

•  SensorML  -  Sensor  Model  Language:  The  standard  models  and  XML 
Schema  for  describing  the  processes  within  sensor  and  observation  pro¬ 
cessing  systems. 

(See  the  OGC  website’s  Sensor  Web  Enablement  description41  for  details.) 

The  data  model  of  O&M  is  generic  in  the  sense  that  its  core  element,  an 
observation  event,  can  be  mapped  against  all  kinds  of  physical  properties: 

‘An  observation  is  an  act  associated  with  a  discrete  time  instant  or  period 
through  which  a  number,  term,  or  other  symbol  is  assigned  to  a  phenomenon. 
It  involves  application  of  a  specified  procedure,  such  as  a  sensor,  instrument, 
algorithm,  or  process  chain.  The  procedure  may  be  applied  in  situ,  remotely, 
or  ex  situ  with  respect  to  sampling  location.  The  result  of  an  observation  is  an 
estimate  of  the  value  of  a  property  of  some  feature’  (Cox,  2013). 

In  a  series  of  so-called  testbeds,  the  OGC  Interoperability  Program  (IP) 
addresses  fundamental  questions  regarding  testing,  prototyping  and  early 
adoption  of  OGC  standards.  These  testbeds  consist  of  several  threads  in  spe¬ 
cific  application  domains,  such  as  aviation.  In  one  of  these  threads  -  on  Cross- 
Community-Interoperability  (CCI)  -  the  OGC  has  taken  up  the  idea  of  map¬ 
ping  VGI  information  against  the  O&M  data  model  (see  testbed  10  CCI  VGI 
Engineering  report  (OGC,  2014)).  By  transforming  social  media  content  into 
the  O&M  data  model,  the  data  can  further  be  served  by  OGC  service  com¬ 
ponents  in  a  standardised  way,  as  observations  made  by  the  human  observer, 
by  using  the  Sensor  Observation  Service  (SOS).  The  testbed  report  also  states 
some  real-world  problems  -  since  the  prototype  was  tested  against  several  cli¬ 
ents,  some  of  which  could  not  deal  with  the  SOS  interface  (at  the  time  of  writ¬ 
ing  SOS  is  not  yet  as  widespread  as  the  Web  Feature  Service  (WFS)  interface), 
the  data  were  also  encoded  as  features  for  usage  within  a  WFS.  In  this  scenario, 
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the  social  media  content  was  harvested  by  using  the  REST  interface  of  the  ser¬ 
vice  (Flickr  in  their  example)  and  uploaded  as  observations  to  the  SOS  after 
being  transformed  into  the  O&M  model.  This  development  was  taken  up  as 
‘SWE  for  Citizen  Science’  as  part  of  the  discussions  that  led  to  the  proposal  of  a 
new  OGC  Domain  Working  Group  on  Citizen  Science  (that  was  adopted  at  the 
OGC  Technical  Committee  Meeting  in  September  2016). 


4.3  The  Provision  of  OpenStreetMap  (OSM)  as  Linked  Data 

An  interesting  case  builds  on  one  of  the  most  prominent  VGI  initiatives  so  far: 
OpenStreetMap  (OSM).  In  the  provision  of  OSM  as  Linked  Data  (Stadler  et  al., 
2012),  the  traditional  OSM  dataset  gets  translated  into  a  model  that  imple¬ 
ments  the  Linked  Data  paradigm  using  RDF.  Technically,  the  OSM  data  are 
periodically  extracted  from  the  official  web  page  (openstreetmap.org),  trans¬ 
formed  into  an  RDF  representation  and  loaded  into  a  publicly  available  triple 
store  that  is  essentially  an  RDF  database.  This  processing  is  enabled  by  the  open 
licensing  model  of  OSM. 

Apart  from  changing  the  data  model  (i.e.  data  formats  and  structures  that 
are  used  to  encode  the  points,  lines,  polygons,  etc.  that  are  used  within  OSM), 
the  transition  to  a  Linked  Data  approach  also  provides  a  step  change  in  respect 
to  (semantic)  interoperability.  While  OSM  defined  its  own  structures  and  map 
elements  (features)  that  are  at  most  known  to  its  own  community,  RDF  is  a 
recognised  standard  of  the  W3C  and  thereby  well  known  to  web  developers 
around  the  globe,  i.e.  far  beyond  the  original  OSM  contributors  and  the  geo¬ 
spatial  community.  As  such,  datasets  that  are  translated  to  so-called  RDF  triples 
(subject-predicate-object)  can  be  easily  connected  to  other  triples  by  adding 
standard  or  self- defined  relationships.  In  this  way,  datasets  from  multiple  pro¬ 
viders  become  interconnected  and  can  be  cross-navigated  within  the  Linked 
Data  Cloud42. 

In  addition  to  introducing  a  standard  way  of  modelling  and  related  encod¬ 
ings,  RDF  also  provides  the  possibility  to  reuse  existing  vocabularies  so  that  the 
expressions  used  to  represent  subjects,  predicates  and  objects  are  understood 
by  many  different  communities  (and  not  only  by  those  that  are  familiar  with 
a  particular  VGI  dataset,  such  as,  in  this  case,  OSM).  Considering  geospatial 
data,  for  example,  one  might  use  the  Location  Core  Vocabulary43  for  describ¬ 
ing  any  place  in  terms  of  its  name,  address  or  geometry.  In  a  similar  manner 
vocabularies  exist  to  describe  persons  and  their  social  network44  or  even  rela¬ 
tionships  between  terms  in  two  different  vocabularies45.  The  most  important 
point  here  is  that  the  use  of  RDF  is  a  well  established  step  to  breaking  down 
the  silos  between  closed  communities,  such  as  the  SDI  or  the  VGI  community 
(see  also  Schade  and  Smits,  2012).  Compared  to  many  current  OGC  standards, 
which  mostly  evolve  in  parallel  worlds,  RDF  provides  common  grounds  for  all 
sorts  of  different  communities.  This  is  because  RDF  builds  on  the  (semantic) 
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web  as  the  common  denominator  and  enables  the  specification  of  community- 
specific  vocabularies,  together  with  shared  terms  and  well  defined  mappings. 
The  mechanisms  of  vocabulary  reuse  and  matching  avoid  the  need  for  addi¬ 
tional  architectural  approaches  to  join  information  from  separately  operating 
communities,  such  as  wrappers,  brokers  or  proxies. 

While  the  above  holds  for  all  data  models,  it  particularly  also  holds  for  models 
of  data  quality.  Returning  to  the  concrete  example  of  OSM,  the  overall  quality 
assurance  and  data  management  mechanisms  remain  core  business  within  the 
traditional  platform  that  underlies  OSM  (available  from  openstreetmap.org). 
The  architecturally  loosely  coupled  Linked  Data  representation  adds,  for  exam¬ 
ple,  the  possibility  to  apply  W3C  vocabularies  related  to  data  quality  -  most 
notably  the  W3C  Data  on  the  Web  Best  Practices:  Dataset  Quality  Vocabulary 
(W3C,  2016a)  and  Data  Usage  Vocabulary  (W3C,  2016b).  Whereas  DQV  pro¬ 
vides  the  means  to  describe  ‘the  quality  of  a  dataset  ...,  whether  by  the  dataset 
publisher  or  by  a  broader  community  of  users’  (W3C,  2016a),  DUV  specifies 
‘a  number  of  foundational  concepts  used  to  collect  dataset  consumer  feed¬ 
back,  experiences,  and  cite  references  associated  with  a  dataset’  (W3C,  2016b). 
Together,  both  vocabularies  could  also  be  used  for  VGI,  in  order  to  support  pro¬ 
viders  to  express  quality  parameters  of  their  offerings,  but  also  to  enable  users  to 
add  their  experiences  and  feedback  to  these  parameters. 

Yet,  at  the  time  of  writing,  both  of  these  best  practices  are  only  availa¬ 
ble  in  draft  versions  and  so  far  (to  our  knowledge)  we  still  lack  tangible 
access  to  using  this  concrete  approach  in  a  VGI  context.  We  consider  it  as  an 
extremely  exciting  area  that  is  worth  exploring  (and  comparing  to  dedicated 
OGC-centric  approaches)  in  respect  to  VGI  data  management.  The  example 
of  OSM  as  Linked  Data  may  be  the  most  straightforward  use  case  for  testing 
these  possibilities. 


5  Conclusion 

In  this  chapter,  we  have  looked  into  some  generic  -  and  not  only  VGI  projects- 
specific  -  principles  and  good  practices  of  data  management,  with  the  central 
paradigm  being  the  FAIR  principle:  data  should  be  findable,  accessible,  inter¬ 
operable  and  reusable.  To  be  reusable,  it  is  vital  that  (meta)data  are  released 
with  a  clear  and  accessible  data  usage  licence  (see  Chapter  6,  Mooney  and 
Minghini,  2017).  Furthermore,  we  have  summarised  standards  that  support 
these  principles,  both  from  the  Open  Geospatial  Consortium  and  from  ISO 
TC/211,  as  well  as  from  W3C,  and  we  have  investigated  three  examples  where 
these  principles  and  standards  are  utilised  to  maximise  cross-discipline  inter¬ 
operability. 

A  key  conclusion  from  this  review  into  the  current  state  of  the  art  is  that 
metadata  for  VGI  are,  and  are  likely  to  remain,  patchy  and  extremely  hetero¬ 
geneous.  ‘Traditional’  standards  aimed  at  complete  documentation  of  a  one- 
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off  production  workflow,  such  as  ISO  19115/19157,  are  rich  in  descriptive 
elements  that,  if  used  properly,  can  enable  the  provenance  and  quality  of  geo¬ 
spatial  data  to  be  documented  in  very  useful  and  machine-readable  ways  that 
support  uncertainty  propagation  and  fitness-for-use  assessment.  However,  an 
investigation  of  open  geospatial  catalogues  quickly  shows  that  these  standards 
are  not  being  exploited  to  their  full  potential,  even  by  large  institutional  data 
producers  -  partly  because  of  the  resource- intensive  nature  of  metadata  gen¬ 
eration,  and  partly  because  of  an  ongoing  shortage  of  tools  and  examples  to 
simplify  the  process.  For  VGI,  where  even  a  single  ‘dataset’  can  contain  obser¬ 
vations  produced  by  a  wide  variety  of  observers,  instruments  and  methods, 
such  monolithic  standards  may  only  be  of  use  for  periodic  review  and  docu¬ 
mentation  of  aggregated  and  quality- controlled  data.  In  addition,  the  nature 
of  VGI  is  such  that  observations  may  be  accessed  and  used  in  a  variety  of 
different  combinations  and  groupings.  With  such  a  fluid  granularity,  tools 
and  APIs  that  allow  annotation  and  documentation  of  individual  records  or 
groups  of  records  are  likely  to  be  more  useful,  as  are  any  tools  and  processing 
methods  that  permit  the  collection  and  storage  of  metadata  automatically  at 
the  point  of  observation.  Ongoing  developments  in  RDF  and  Linked  Data 
appear  very  promising  for  supporting  data  annotation,  but  are  still  too  imma¬ 
ture  to  be  easily  usable  within  most  VGI  initiatives.  However,  this  is  a  key 
angle  of  research  that  should  be  developed,  not  least  because  the  annotation/ 
commentary  approach  to  metadata  permits  information  and  quality  reports 
to  be  attached  to  data  after  their  production,  so  that  VGI  can  be  mobilised  and 
made  more  usable  and  reusable. 

We  have  not  looked  into  software  solutions  of  how  to  access,  store  and  back 
up  data,  for  example  which  database  management  solution  to  use,  such  as 
PostgreSQL  (with  its  language  extension  PostGIS),  MySQL  or  the  lightweight 
SpatiaLite,  to  name  a  few.  We  have  also  only  touched  the  surface  of  the  topic 
of  software  suites  like  GeoServer,  deegree  or  GeoNetwork,  all  of  which  offer 
substantial  building  blocks  for  Spatial  Data  Infrastructures.  We  encourage  the 
use  of  Open  Source  software  like  these,  as  well  as  open  and  freely  accessible 
standards. 

In  this  text  we  have  not  addressed  Environmental  Sensor  Networks  (ESNs) 
that  may  comprise  a  backbone  in  data  assessment  from  distributed  heterogene¬ 
ous  sensors.  We  expect  that  the  Sensor  Web  Enablement,  as  an  OGC  reference 
framework,  will  play  an  important  role  in  citizen  sensing.  For  further  read¬ 
ing,  the  FP7  funded  Citizen  Observatory  ‘COBWEB’  has  defined  a  ‘Generic 
Infrastructure  Platform  to  facilitate  the  collection  of  Citizen  Science  data  for 
Environmental  Monitoring’(Higgins  et  al.,  2016). 

In  terms  of  actual  formulation  of  Data  Management  Plans,  substantial 
resources  are  available;  see  for  example  DataOne’s  ‘Data  Management  Guide  for 
Public  Participation  in  Scientific  Research’46  or  COBWEB’S  ‘Generic  Data  Man¬ 
agement  Plan  Check’  in  their  ‘deliverable  7.1  on  Data  Management  Guidelines.’47 
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Data  management  methodologies  can  only  succeed  if  their  benefits  overcome 
their  implementation  costs;  i.e.  existing  solutions  and  best  practices  will  have 
to  be  tailored  to  the  needs  and  capabilities  of  individual  projects,  and  feasibility 
needs  to  be  assessed  on  a  case  by  case  basis.  However,  it  is  imperative  to  recog¬ 
nise  that  a  precise  knowledge  of  the  provenance  and  meaning  of  data  is  a  most 
precious  asset  that  should  be  highly  valued. 


Notes 

1  https://ec.europa.eu/programmes/horizon2020/en/news/citizens%E2% 
80%99-observatories-empowering-european-society-open-conference 

2  http://www.eubon.eu/ 

3  https://cobwebproject.eu/ 

4  https://www.earthobservations.org/documents/dswg/201504_data_man- 
agement_principles_long_final.p  df 

5  http://www.bfe-inf.org/info/data-principles 

6  E.g.  http://vocab.nerc.ac.uk/collection/P15/current/CFCM0010/ 

7  E.g.  http://seadatanet.maris2.nl/v_bodc_vocab_v2/vocab_relations.asp? 
lib=P02 

8  http://ijsdir.jrc.ec.europa.eu/index.php/ijsdir/article/view/389 

9  https://www.journals.elsevier.com/data-in-brief 

10  https://www.w3.org/ 

11  http://www.opengeospatial.org/ 

12  http://www.gbif.org/mendeley 

13  http://www.iso.org/iso/catalogue_detail?csnumber=43506 

14  https://www.crossref.org/ 

15  https://figshare.com/ 

16  https://zenodo.org/ 

17  https://www.datacite.org/ 

18  http://ezid.cdlib.org/ 

19  http://datadryad.org/ 

20  http://www.iso.org/iso/catalogue_detail.htm?csnumber=53798 

21  http://www.ngdc.noaa.gov/wiki/index.php/Data_Set_Identifiers_and_ 
other_Unique_I  D  s 

22  https://geo-ide.noaa.gov/wiki/index. php?title=File:LI_Lineage-2. png 

23  http://charme.org.uk/ 

24  http://www.opengeospatial.org/standards/cat 

25  http://docs.geoserver.org/stable/en/user/security/service.html 

26  http://gbif.github.io/gbif-api/apidocs/org/gbif/api/vocabulary/Occurren- 
celssue.html 

27  https://geo-ide.noaa.gov/wiki/index.phpititleMSO_19115_and_19115-2_ 
CodeList_Dictionaries 
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28  http://www.fgdc.gov/standards/projects/FGDC-standards-projects/meta- 
data/biometadata/biodatap.pdf 

29  http://www.gbif.org/sites/default/files/gbif_IPT-sample-data-primer_ 
en.pdf 

30  http://www.gbif.org 

31  http :  //www.gbif.  org/ wh  at  -  is  -  gbif#b  ackground 

32  http :  //www.gbif.  org/ wh  at  -  is  -  gbif 

33  http://www.gbif.org/resource/80527 

34  https://knb.ecoinformatics.Org/#external//emlparser/docs/index.html 

35  http://www.tdwg.org/activities/abcd/ 

36  http://www.tdwg.org/activities/tapir/ 

37  http://digir.sourceforge.net/ 

38  http://www.tdwg.org/standards/ 

39  http://www.gbif.org/infrastructure/summary 

40  http://www.gbif.org/publishing-data/data-papers 

41  http://www.opengeospatial.org/ogc/markets-technologies/swe 

42  http://lod-cloud.net/ 

43  https://www.w3.org/ns/locn 

44  http://www.foaf-project.org/ 

45  https://www.w3.org/2004/02/skos/ 

46  https://www.dataone.org/sites/all/documents/DataONE-PPSR-DataMan- 
agementGuide.pdf 

47  https://cobwebproject.eu/sites/default/files/COBWEB%20D7.l%20 
Data%20Management%20Guidelines%20vl_0.pdf 
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Abstract 

Spatial  Data  Infrastructures  (SDIs)  are  a  special  category  of  data  hubs  that 
involve  technological  and  human  resources  and  follow  well  defined  legal  and 
technical  procedures  to  collect,  store,  manage  and  distribute  spatial  data. 
INSPIRE  is  the  EU’s  authoritative  SDI  in  which  each  Member  State  provides 
access  to  their  spatial  data  across  a  wide  spectrum  of  data  themes  to  support 
policy-making.  In  contrast,  Volunteered  Geographic  Information  (VGI)  is  one 
type  of  user-generated  geographic  information  (GI)  where  volunteers  use  the 
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web  and  mobile  devices  to  create,  assemble  and  disseminate  spatial  informa¬ 
tion.  There  are  similarities  and  differences  between  SDIs  and  VGI,  as  well  as 
advantages  and  disadvantages  to  both.  Thus,  the  integration  of  these  two  data 
sources  will  enhance  what  is  offered  to  end  users  to  facilitate  decision-making. 
This  idea  of  integration  is  in  its  early  stages,  because  several  key  issues  need 
to  be  considered  and  resolved  first.  Therefore,  this  chapter  discusses  the  chal¬ 
lenges  of  integrating  VGI  with  INSPIRE  and  outlines  a  generic  framework  for  a 
global  integrated  GIS  platform,  similar  in  concept  to  Digital  Earth  and  Virtual 
Geographic  Environments  (VGEs),  as  a  realistic  scenario  for  advancements  in 
the  short  term. 


Keywords 

SDIs,  INSPIRE,  VGI,  Global  Integrated  GIS  platform 


1  Introduction 

Data  hubs  have  arisen  through  the  evolution  of  information  technology,  and 
aim  to  provide  a  centralised,  unified  data  source  that  can  be  easily  accessed 
by  certain  groups  of  users,  or  more  widely  by  the  public,  to  support  a  diver¬ 
sity  of  professional  and/or  other  needs  (Mangano,  2013).  A  special  category 
of  data  hub  is  that  of  Spatial  Data  Infrastructures  (SDIs;  Williamson  et  al., 
2003),  which  emerged  during  the  mid-1990s  (Delaney  and  Pettit,  2014).  SDIs 
involve  technological  and  human  resources  that  follow  well  defined  legal  and 
technical  procedures  to  collect,  store,  manage  and  distribute  spatial  data.  On 
14  March  2007,  the  European  Parliament  and  Council  adopted  a  Directive 
establishing  the  Infrastructure  for  Spatial  Information  in  the  European  Com¬ 
munity  (INSPIRE)  European  SDI  (European  Commission,  2007).  Following 
the  INSPIRE  Directive,  Public  Authorities  (PAs)  in  each  Member  State  should 
provide  access  to  their  SDI  across  a  wide  spectrum  of  data  themes  through 
a  community  geoportal,  aiming  thus  to  support  policy-making  and  activities 
aimed  at,  but  not  limited  to,  the  protection  of  the  environment. 

Whilst  INSPIRE  tries  to  unite  and  standardise  existing  Authoritative  Geo¬ 
graphic  Information  (AGI)  made  available  by  PAs  in  EU  Member  States, 
technologies  that  enable  User-Generated  Content  (UGC)  have  also  appeared 
(Moens  et  al.,  2014)  in  web-based  platforms  (e.g.  blogs,  wikis,  discussion 
forums,  posts,  chats,  tweets),  mobile  computing  and  GPS  devices.  Hence, 
users  have  started  to  create  and  share  data  and  information.  Volunteered 
Geographic  Information  (VGI)  is  one  type  of  user  generated  GI  (Goodchild, 
2007),  where  volunteers  use  the  web  and  mobile  devices  to  create,  assemble 
and  disseminate  spatial  information.  Among  the  most  well  known  VGI  plat¬ 
forms  are  OpenStreetMap  (OSM;  Demetriou,  2016)  and  Wikimapia,  but  there 
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are  many  others,  covering  a  range  of  fields  such  as  conservation,  planning,  and 
crisis  management.  Thus,  there  is  a  potential  for  VGI  to  become  an  impor¬ 
tant  source  of  information  that  could  benefit  INSPIRE  and  similar  projects 
and  efforts;  on  the  other  hand,  VGI  could  also  benefit  from  INSPIRE  through 
integration  with  official  and  reliable  data  and  the  need  to  adopt  more  strict 
specifications. 

Although  INSPIRE1  is  a  well  organised,  official  and  reliable  platform  that  is 
based  on  strict  standards,  it  provides  data  that  are  mainly  used  by  experts  and 
involves  static  information  (with  a  limited  level  of  detail  in  some  cases)  that 
is  not  updated  very  regularly  because  of  the  high  costs  involved.  VGI,  on  the 
other  hand,  is  captured  unofficially  by  volunteers,  often  using  cheap  devices, 
e.g.  a  handheld  GPS  or  smartphones;  hence  the  data  quality  is  usually  limited 
and  the  data  collection  is  not  based  on  strict  standards.  However,  real-time  data 
can  be  collected  anywhere  by  anybody,  opening  up  concrete  possibilities  for 
data  to  be  updated  very  regularly  at  little  or  no  cost.  Therefore,  the  integration 
of  both  types  of  data  (Craglia,  2007;  Budhathoki  et  al.,  2008;  Craglia  et  al.,  2008; 
McDougall,  2009;  Parker  et  al.,  2012;  Massa  and  Campagna,  2016)  could  poten¬ 
tially  enhance  what  is  delivered  to  end  users,  supporting  the  full  spectrum  of 
related  needs,  both  professional,  e.g.  planning  and  spatial  decision-making, 
and  of  the  daily  activities  of  citizens. 

The  idea  of  integration  of  VGI  and  authoritative  data  has  arisen  recently  and 
been  emphasised  by  several  researchers  (Budhathoki  et  al,  2008;  Craglia  et  al., 
2008;  McDougall,  2009;  Parker  et  al.,  2012).  In  addition,  the  benefits  of  inte¬ 
gration  refer  to  both  the  organisations  involved,  i.e.  National  Mapping  Agen¬ 
cies  (NMAs;  Olteanu-Raimond  et  al,  2017)  that  operate  national  INSPIRE 
geoportals,  and  those  who  run  VGI  initiatives,  as  well  the  end  users.  Although 
some  efforts  towards  this  integration  have  already  been  made  (Craglia,  2007; 
Wiemann  and  Bernard,  2014),  the  literature  suggests  that  this  endeavour  is  in 
its  early  stages  because  several  critical  issues  need  to  be  considered  and  resolved. 
As  a  result,  the  available  literature  is  limited  and  focuses  on  specific  projects  or 
technical  issues  (Botshelo,  2009)  without  attempting  to  investigate  the  broader 
picture  of  integration  or  setting  out  a  conceptual  framework.  Further  to  this  inte¬ 
gration,  the  vision  is  the  development  of  a  global  integrated  GIS  platform,  which 
extends  the  capabilities  of  a  typical  data  hub  and  the  benefits  of  integration  of 
SDIs  with  VGI  by  embedding  on-line  geospatial  tools,  to  deliver  both  static  and 
dynamic  outputs  to  support  planning  and  decision-making.  Such  visionary  and/ 
or  applied  advanced  geospatial  tools  and  frameworks  moving  in  this  direction 
are  the  Geo  Web  (Dangermond,  2005),  Digital  Earth  (Craglia  et  al.,  2008)  and 
Virtual  Geographic  Environments  (VGEs;  Lin  et  al.,  2013). 

Based  on  the  above,  this  chapter  aims  to  discuss  the  challenges  of  integrating 
VGI  with  INSPIRE,  and  to  outline  a  generic  framework  for  a  global  integrated 
GIS  platform,  similar  in  concept  to  Digital  Earth  and  VGEs,  as  a  realistic  sce¬ 
nario  for  advancements  in  the  future.  The  remainder  of  this  chapter  is  organ¬ 
ised  as  follows:  Section  2  provides  an  overview  of  SDIs  and  VGI,  contrasting 
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these  two  sources  of  data.  This  is  followed  by  a  discussion  about  critical  issues 
that  arise  in  INSPIRE  and  VGI  integration  (Section  3).  In  Section  4,  the  pros¬ 
pects  of  integration  are  examined,  with  some  examples.  Section  5  then  presents 
an  outline  of  a  conceptual  framework  for  an  ideal  global  integrated  GIS  plat¬ 
form,  while  conclusions  are  summarised  in  Section  6. 


2  Spatial  Data  Infrastructures  (SDIs)  and  Volunteered 
Geographic  Information  (VGI) 

Before  discussing  the  various  issues  of  integration  between  SDIs  and  VGI,  an 
overview  of  each  infrastructure  and  a  comparison  are  presented,  providing  the 
necessary  background. 


2.1  Spatial  Data  Infrastructures  (SDIs) 

Data  hubs  are  defined  as  community-run  catalogues  of  useful,  online  datasets, 
which  store  a  copy  of  the  data  or  host  them  in  a  database  and  provide  some 
basic  visualisation  tools  (Open  Knowledge  Foundation,  2013).  A  typical  data 
hub  consists  of  four  basic  elements,  as  shown  in  Figure  1:  Data,  a  Facilitator, 
a  Custodian  and  End  Users,  which  together  form  a  dynamic  communication 
cycle  (Delaney  and  Pettit,  2014). 

In  particular,  the  Facilitator  should  provide  a  connection  between  the  Cus¬ 
todian,  i.e.  the  data  hub’s  administrator,  and  the  End  Users;  negotiate  with  the 


Fig.  1  :  Data  hub  conceptual  communication  -  feedback  cycle  (adapted  from 
Delaney  and  Pettit,  2014). 
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Custodian  in  terms  of  the  needs  or  problems;  and  provide  feedback  to  end 
users.  The  role  of  the  Custodian  is  to  provide  and  distribute  data,  which  will  be 
used  by  the  End  Users.  It  is  to  be  noted  that  the  terms  ‘end  users’  and  ‘users’,  as 
used  in  this  chapter,  have  a  slightly  different  meaning:  namely,  while  ‘end  users’ 
utilise  the  data  provided  by  the  hub,  they  do  not  necessarily  contribute  to  the 
development  of  the  hub  voluntarily,  i.e.  by  delivering  new  data,  updating  exist¬ 
ing  data  or  sharing  data  -  tasks  that  are  carried  out  by  ‘users’.  Obviously,  ‘users’ 
can  also  be  ‘end  users’;  that  is,  they  can  have  a  double  role. 

Access  to  data  hubs  can  be  free  and/or  licensed.  A  data  hub  allows  users  to 
access,  search  and  use  a  variety  of  data  with  associated  metadata  provided  as  a 
discrete  set  of  formats.  The  data  hub  concept  has  been  realised  in  many  loca¬ 
tions  and  contexts  globally.  Many  scientific  fields  have  collaborated  to  create 
research-specific  data  hubs  to  store  and  discover  data  and  to  distribute  them  to 
other  researchers  (Delaney  and  Pettit,  2014). 

SDIs  are  a  special  category  of  data  hubs  (Williamson  et  al.,  2003)  that  involve 
a  framework  of  interacting  elements,  aiming  to  acquire,  store,  preserve,  pro¬ 
cess,  distribute,  use  and  maintain  data  with  ‘a  direct  or  indirect  reference  to  a 
specific  location  or  geographical  area  (European  Commission,  2007).  The  main 
elements  of  this  framework  are:  spatial  datasets  and  their  metadata;  networks 
services  and  technologies;  standards  that  define  the  quality  of  the  data;  policies 
for  distributing  and  managing  the  data;  human  resources;  and  a  mechanism  for 
coordinating  and  monitoring  the  whole  infrastructure  (European  Commission, 
2007;  Iliffe,  2012).  An  SDI  may  be  developed  by  national  public  bodies  to  sup¬ 
port  all  of  the  spatially  relevant  activities  in  a  country.  Each  national,  regional 
or  local  SDI,  as  a  node  of  INSPIRE,  recognises  the  significance  of  metadata  by 
ensuring  all  contributed  data  align  to  a  minimum  standard  and  aims  to  deliver 
up-to-date  data  and  information  to  other  government  agencies  and  the  general 
public  (Steven,  2005)  to  support  effective  decision-making.  Several  SDIs  have 
been  developed  (Craglia,  2007),  e.g.  the  National  Spatial  Data  Infrastructure 
(NSDI)  in  the  United  States  in  1994  and  INSPIRE  in  Europe. 


2.2  Volunteered  Geographic  Information  (VGI) 

UGC  is  divided  into  two  main  types:  non-georeferenced  and  georeferenced,  as 
illustrated  in  Figure  2.  The  most  popular  forms  of  the  former  type  include  text 
messaging,  social  media  interactions,  photos,  videos,  blog  entries,  etc.  Georef¬ 
erenced  UGC  involves  various  forms  of  location-based  technologies,  such  as 
location-based  services  (LBSs),  location-based  social  networks  (LBSNs),  social 
network  location  sharing  (SNLS),  location -based  games  (LBGs)  and  location- 
based  social  network  games  (LBSNGs;  Odobasic  et  al.,  2013).  In  particular,  the 
LBS  industry  has  profited  from  UGC  primarily  because  ubiquitous  and  afford¬ 
able  smartphones  equipped  with  multiple  sensors  foster  geographic  data  col¬ 
lection.  Similarly,  LBSN  leverage  the  power  and  high  adoption  rate  of  modern 
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Fig.  2:  Types  of  user-generated  content. 
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mobile  devices  to  provide  applications  and  services  that  allow  users  to  share 
and  discuss  the  real-world  places  they  visit,  as  a  part  of  their  virtual  interactions 
(Furey  et  al.,  2013).  In  terms  of  social  networks,  location  sharing  has  changed 
from  a  purpose-driven  to  a  social-driven  activity.  Users  traditionally  shared 
their  location  with  one  other  person  (one-to-one)  or  with  a  small  group  (one- 
to-few);  social  networks,  depending  on  the  privacy/user  settings,  enable  users 
to  share  their  location  with  a  large  group  (one-to-many)  or  with  everyone  (one- 
to-all;  Tang  et  al.,  2010).  LBGs  are  games  in  which  the  game  play  somehow 
evolves  and  progresses  based  on  a  player’s  location.  Thus,  LBGs  almost  always 
support  some  kind  of  georeferencing  technology,  for  example  by  using,  WiFi, 
Near  Field  Communication,  Bluetooth  and  satellite  positioning  such  as  GPS. 
The  blend  of  LBGs  and  LBSNs  creates  LBSNGs,  which  are  exemplified  by  a 
service  like  Foursquare. 

Among  the  most  popular  geo-UGC-based  technologies  is  VGI  (Goodchild, 
2007),  or  crowdsourced  GI,  which  has  arisen  since  2007.  VGI  involves  harness¬ 
ing  tools  to  create,  assemble  and  disseminate  geographic  data  provided  volun¬ 
tarily  by  individuals,  and  it  can  be  generated  through  geobrowsers  or  smart¬ 
phone  apps,  making  use  of  georeferencing  or  geocoding  tools  and  techniques. 
Two  widely  popular  VGI  platforms  are  OSM  (Haklay,  2010)  and  Wikimapia 
(Wikimapia,  2015),  but  there  are  many  others,  covering  many  kinds  of  fields, 
such  as  conservation,  planning,  and  crisis  management.  A  special  class  of 
VGI  is  Social  Media  Geographic  Information  (SMGI),  which  can  generally  be 
divided  into  active  and  passive  type  (Figure  2).  The  former  type  is  produced  for 
a  given  scope,  e.g.  citizen  science,  crowd  mapping  or  public  participation,  and 
users  (i.e.  volunteer  contributors)  are  fully  aware  of  this,  such  as  in  the  case  of 
OSM  or  Wikimapia.  In  contrast,  the  latter  is  produced  for  other  purposes  (i.e. 
users  share  passively  or  share  unvolunteered  information  for  undefined  pur¬ 
poses,  such  as  in  the  case  of  social  network  interaction)  and  may  be  accessed 
independently  at  a  later  stage  for  reuse  by  third  parties  for  a  variety  of  disparate 
aims. 


2.3  A  Comparison  of  SDIs  and  VGI 

There  are  similarities  and  differences  between  SDIs  and  VGI  (Castelein  et  al., 
2010)  regarding  data,  as  well  as  advantages  and  disadvantages,  and  these  are 
outlined  in  Figure  3.  In  particular,  data  provided  by  SDIs  are  captured  by  well 
trained  specialists  who  are  employed  by  formal  public  or  private  organisations, 
and  through  well  defined  workflows,  using  state-of-the-art  technology  (Caste¬ 
lein  et  al.,  2010);  hence  the  SDI  approach  is  an  official,  top-down  approach 
involving  high  costs.  On  the  other  hand,  VGI  is  captured  unofficially  by  volun¬ 
teer-citizens  (classified  by  Coleman  et  al.  (2009)  into  five  categories),  through 
smart  phones/devices  that  provide  GPS  and  Internet  access  or  using  other  sim¬ 
ple  aids  to  take  measurements;  it  is  a  bottom-up  process  with  limited  or  no 
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operational  costs.  Whilst  the  former  data  are  generally  free  of  charge  or  can  be 
licensed  through  a  fee,  the  latter  are  always  provided  for  free.  Moreover,  SDIs 
have  a  data-centric  scope  as  they  mainly  provide  data  used  by  experts  through 
GIS  portals,  while  VGI  delivers  information  to  a  broader  audience  of  mainly 
non-experts  through  user-friendly  GI  platforms. 

In  addition,  SDIs  involve  static  information  provided  periodically  and  in 
some  cases  with  a  limited  level  of  detail,  while  VGI  has  both  static  and  dynamic 
(real-time)  information,  since  it  can  process  real-time,  spatiotemporal  infor¬ 
mation,  and  can  provide  a  much  greater  level  of  detail  in  some  cases.  This  sug¬ 
gests  that  VGI  could  be  a  potentially  complementary  source  to  SDI  in  provid¬ 
ing  relevant  real-time  data  related  to  physical  catastrophes,  crisis  management 
situations  or  humanitarian  missions.  Furthermore,  SDI  provides  certified  data 
based  on  strict  and  professional  international  standards  and  specifications 
such  as  that  provided  by  the  Open  Geospatial  Consortium  (OGC)  and  Interna¬ 
tional  Standardisation  Organisation  (ISO),  while  VGI  is  based  on  essential  data 
standards  that  vary  from  platform  to  platform;  most  importantly,  the  quality  of 
their  data  is  unknown. 

The  above  comparison  also  reveals  two  weaknesses  of  SDIs:  the  lack  of  capac¬ 
ity  for  real-time  data  to  be  collected  anywhere  by  anybody  and  the  lack  of  the 
flexibility  of  very  regular  data  updates  at  low  or  no  cost.  Thus,  a  combination  of 
both  technologies  will  enhance  what  is  offered  to  end  users  to  facilitate  decision¬ 
making,  and  the  idea  of  integration  has  been  discussed  by  several  researchers 
(Budhathoki  et  al.,  2008;  Craglia  et  al.,  2008;  McDougall,  2009;  Parker  et  al, 
2012).  However,  this  challenge  will  not  be  an  easy  one,  because  the  institutional 
framework  of  the  integration  will  be  complex  due  to  the  different  requirements 
and  scope  underlying  each  technology. 
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Fig.  3:  The  differences  between  SDIs  and  VGI. 
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3  Integrating  VGI  to  INSPIRE 

The  dominant  European  SDI  is  INSPIRE,  and  its  integration  with  VGI  is  a  dif¬ 
ficult  task  because  of  several  critical  issues  regarding  the  common  implement¬ 
ing  rules,  which  are  discussed  below.  An  overview  of  INSPIRE  is  first  provided. 


3.1  The  INSPIRE  Directive 

INSPIRE,  which  has  been  defined  by  EU  Directive  2007/2/EC  (European  Com¬ 
mission,  2007)  and  was  adopted  in  2007,  establishes  the  requirement  that  each 
Member  State  should  provide  access  to  their  SDI  through  a  community  geo¬ 
portal  operated  by  the  European  Commission  or  any  other  access  point  they 
wish  to  operate.  The  INSPIRE  implementation  provides  a  large-scale  applica¬ 
tion  of  the  open  geoportal  environment  and  is  a  big  step  forward  in  the  devel¬ 
opment  of  an  SDI  in  Europe.  INSPIRE  will  overcome  existing  weaknesses  and 
gaps  in  the  interoperability  of  information  resources  across  Europe  by  integrat¬ 
ing  them  into  a  common  framework  (Craglia,  2007).  The  aim  of  INSPIRE  is 
to  assist  policy-making  and  activities  related  to  the  environment  and  beyond; 
hence  it  involves  data  regarding  a  broad  spectrum  of  fields,  which  are  reflected 
in  34  spatial-data  themes.  The  INSPIRE  implementation  represents  a  signifi¬ 
cant  investment  from  all  Member  States,  and  has  resulted  in  close  to  300,000 
spatial  datasets  being  made  available  to  the  community  through  a  standardised 
data-discovery  site.  The  main  INSPIRE  portal  allows  users  to  search  for  data¬ 
sets  from  across  the  EU  from  a  single  interface,  and  allows  advanced  search 
filters  to  be  used  to  narrow  down  searches  by  geography,  format  or  spatial 
theme.  The  INSPIRE  portal  only  displays  metadata  for  each  dataset;  it  does  not 
allow  users  to  directly  access  any  of  the  datasets,  either  manually  or  program¬ 
matically.  However,  each  metadata  resource  contains  a  link  to  the  data  source, 
which  may  be  a  file,  service  or  web  application. 

It  should  be  noted  that  INSPIRE  involves  some  general  rules:  it  is  based  on 
existing  SDI  of  Member  States,  and  hence  does  not  require  the  collection  of 
new  data,  but  demands  the  transformation  of  existing  data  to  comply  with  its 
specification  structure;  and  it  does  not  affect  intellectual  property  rights.  In 
particular,  the  Directive  also  requires  that  common  implementing  rules  be 
adopted  in  four  main  specific  areas:  metadata,  data  specifications,  network 
services,  and  data  and  service  sharing.  These  areas  face  critical  integration 
issues,  as  discussed  below. 


3.2  Critical  Issues  for  Integration 

Following  the  INSPIRE  Directive,  Member  States  should  provide  metadata 
for  spatial  datasets/data  series  and/or  for  spatial  data  services.  The  metadata 
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consist  of  27  elements  of  information  regarding  the  data  resources,  elements 
of  information  which  are  grouped  into  10  categories:  identification;  classifica¬ 
tion;  keywords;  geographic  location;  temporal  reference;  quality  and  validity; 
conformity  with  the  interoperability  implementing  rules;  constraints  related 
to  access  and  use;  organisation  responsible  for  the  resource;  and  metadata  for 
metadata  (European  Commission,  2007).  Clearly,  populating  all  of  these  ele¬ 
ments  of  metadata  for  VGI  data  will  have  a  consequential  time  and  cost.  Fur¬ 
thermore,  these  elements  cannot  be  gathered  comprehensively  by  volunteers 
given  current  VGI  practices.  An  issue  is  therefore  who  will  be  responsible  for 
inputting  all  of  these  metadata  and  validating  their  reliability.  Therefore,  VGI 
metadata  can  be  limited  to  only  the  basic  information  among  the  27  elements 
provided  by  INSPIRE  that  can  be  input  by  the  contributor,  by  the  VGI  system 
administrator  or  automatically  by  the  system. 

Similarly  to  metadata,  the  employment  of  common  data  specifications  is 
a  vital  aspect  of  integration.  Specifically,  in  order  to  ensure  the  interoperabil¬ 
ity  of  spatial  information  in  INSPIRE,  common  international  standards  (those 
defined  by  ISO),  technical  specifications  (e.g.  regarding  data  types,  code  lists 
and  enumerations,  encoding,  updating,  the  life  cycle  of  spatial  objects,  refer¬ 
ence  temporal  systems,  and  metadata)  and  minimum  performance  criteria  for 
download  services  and  transformation  services  have  been  defined  (for  each  of 
the  34  related  themes  mentioned  earlier) .  The  issue  of  how  to  accommodate  the 
diversified,  dynamic  and  easy-to-access  VGI  data  types  to  SDI  is  not  a  serious 
problem  in  technical  terms;  the  problem  is  to  define  and  apply  minimum  data 
requirements  for  VGI  that  are  reasonable  and  achievable  in  order  to  satisfy  data 
quality  requirements  (Wiemann  and  Bernard,  2014).  Aspects  of  data  quality 
such  as  positional  accuracy,  classification  correctness  and  accuracy  of  the  time 
measurement  may  follow  the  ISO  19157  standard  (ISO,  2013;  see  Chapter  7  by 
Fonte  et  al.  (2017)  for  more  information  on  quality);  a  legally  binding  aspect  is 
that  of  the  topological  consistency  of  the  network  data.  VGI  data  quality  and 
credibility  vary  from  contributor  to  contributor  (Flanagin  and  Metzger,  2008; 
Goodchild  and  Li,  2012;  Foody  et  al.,  2013);  thus  it  is  only  up  to  a  data  provider 
whether  they  will  respect  data  quality  recommendations  and  whether  they  will 
report  on  recommendations  in  the  metadata.  Although  some  case  studies  on 
popular  VGI  platforms  such  as  OSM  have  shown  good  and  acceptable  out¬ 
comes  (Haklay,  2010),  NMAs  should  evaluate  the  risks  and  problems  that  arise 
from  the  adoption  of  this  new  production  system  (Coleman  et  al.,  2009;  Begin, 
2012).  Users  should  always  be  aware  of  how  can  they  assess  the  credibility  of 
data  (Flanagin  and  Metzger,  2008)  and  contributors  should  be  aware  of  the 
quality  of  the  data  used  (Dassonville  et  al.,  2003)  and  of  whether  they  are  fit  for 
purpose.  It  is  essential  to  develop  tools  that  enable  this  evaluation.  In  addition, 
data  quality  can  be  improved  by  providing  training  on  the  needs  of  SDIs  and 
on  their  protocols,  and  incentives  can  be  awarded  to  contributors  providing 
good  work  (see  Chapter  5  by  Fritz  et  al.  (2017)  for  a  discussion  of  incentives 
for  volunteers). 
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The  interoperability  of  network  services  is  also  crucial  for  the  joint  opera¬ 
tion  of  the  systems.  In  particular,  INSPIRE  network  services  utilise  one  stand¬ 
ard  communication-protocol  and  binding  technology  for  all  service  types  to 
avoid  mixing  technologies:  the  Simple  Object  Access  Protocol  (SOAP),  which 
ensures  streamlined  integration  and  implementation,  as  well  as  getting  a  maxi¬ 
mum  benefit  from  the  offered  services.  SOAP  is  a  protocol  specification  for 
exchanging  structured  information  in  the  implementation  of  web  services  in 
computer  networks.  It  uses  the  XML  Information  Set  for  its  message  format, 
and  relies  on  other  application  layer  protocols,  most  notably  Hypertext  Trans¬ 
fer  Protocol  (HTTP)  or  Simple  Mail  Transfer  Protocol  (SMTP),  for  message 
negotiation  and  transmission.  In  contrast  to  INSPIRE,  it  is  reasonable  that 
the  various  VGI  platforms  should  use  different  communication-protocols  and 
binding  technologies  through  the  platform  owner’s  Application  Programming 
Interfaces  (APIs).  However,  VGI  may  reuse  the  two  types  of  services  provided 
by  INSPIRE,  i.e.  viewing  and  downloading.  The  former  operation  is  typically 
based  on  OGC  Web  Map  Services  (WMSs)  or  OGC  Web  Map  Tile  Services 
(WMTSs),  which  are  easy  to  integrate  into  a  VGI  application  from  the  tech¬ 
nical  as  well  as  the  legal  point  of  view;  the  VGI  application  acts  like  a  client 
application  to  a  server,  publishing  data  under  the  INSPIRE  Directive.  Most  of 
the  INSPIRE  view  services  are  provided  free  of  charge,  but  there  may  be  condi¬ 
tions  that  prevent  their  reuse  for  commercial  purposes  (European  Commis¬ 
sion,  2007).  The  latter  type  of  service,  download,  is  based  on  OGC  Web  Fea¬ 
ture  Services  (WFSs),  OGC  Web  Coverage  Services  (WCSs)  and  OGC  Sensor 
Observation  Services  (SOSs),  among  others,  which  are  also  easy  to  integrate 
from  a  technological  point  of  view.  Data  published  through  INSPIRE  down¬ 
load  services  may  also  have  associated  fees,  but  these  charges  should  not  exceed 
the  cost  of  collection,  production,  reproduction  and  dissemination,  together 
with  a  reasonable  return  on  investment  (European  Commission,  2007). 

Once  the  aforementioned  technical  issues  are  resolved,  an  integrated  data 
and  service  sharing  policy  should  be  defined.  Currently,  INSPIRE  requires 
Member  States  to  provide  the  institutions  and  bodies  of  the  community  with 
access  to  spatial  datasets  and  data  services  in  accordance  with  harmonised  con¬ 
ditions  based  on  a  minimum  set  of  conditions  to  be  respected.  Member  States 
are  permitted  exceptions  to  data  sharing,  and  can  even  completely  restrict 
access  to  certain  data  or  can  set  security  measures  for  obtaining  access  to  these 
datasets  and  data  services;  for  example,  public-data  access  that  may  threaten 
individual  privacy  or  national  security  can  be  restricted.  While  SDI  data  are 
under  the  full  control  of  each  Member  State  and  several  data  are  provided  free 
of  charge,  VGI  data  are  generally  freely  accessible,  even  though  in  some  cases 
access  is  limited  through  restrictions.  However,  inherently,  VGI  platforms 
encourage  registration  of  new  users  not  only  in  terms  of  access,  but  also  in 
terms  of  inputting  new  data  and  editing  existing  data.  As  a  result,  some  critical 
security  aspects  may  arise  for  society.  For  instance,  how  can  a  criminal  VGI 
contributor  be  identified  if  they  try  to  promote  illegal  activities  and  fraudulent 
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information?  (Legal  issues  of  VGI  are  discussed  in  Chapter  6  by  Mooney  et  at, 
2017.)  The  above  discussion  indicates  that  VGI  cannot  be  ruled  through  a  strict 
framework  such  as  that  applied  for  INSPIRE,  because  it  involves  volunteered 
pieces  of  many  GI  infrastructures  without  an  authoritative  structure  and  scope. 
Therefore,  the  focus  should  be  on  the  minimum  aspects  that  will  ensure  inter¬ 
operability,  credibility  and  security  of  services  and  data. 


4  The  Prospects  of  Integration 

4.1  Integration  for  Supporting  Conventional  Spatial  Tasks 

The  combination  of  INSPIRE  and  VGI  provides  great  potential  for  creating 
a  comprehensive  information  platform  by  linking  the  advantages  of  author¬ 
itative  information,  i.e.  quality  assurance  and  normative  status,  with  VGI 
advantages,  i.e.  rapid,  up-to-date  and  dynamic  information  (Wiemann  and 
Bernard,  2014).  As  a  result,  this  integration  can  benefit  NMAs,  administra¬ 
tors  of  VGI  projects  and  end  users,  with  consequent  socio-economic  impacts 
(Campagna  and  Craglia,  2012).  In  particular,  NMAs  may  have  a  real  oppor¬ 
tunity  to  use  crowdsourced  data  to  update  some  of  their  databases  when 
the  update  is  not  done  by  them  regularly  due  to  the  high  costs  involved  or 
to  add  new  data  that  are  not  available  to  them  (Coleman  et  al.,  2009).  They 
can  also  use  crowdsourced  data  to  detect  changes  or  vernacular  place  names 
(Olteanu-Raimond  et  al.,  2017).  On  the  other  hand,  INSPIRE  can  serve 
as  a  basis  for  validating  VGI  information  (Wiemann  and  Bernard,  2014). 
Furthermore,  end  users  may  use  this  mix  of  official  and  spatio-temporal 
data  for  any  relevant  purpose,  i.e.  for  leisure  (to  walk  in  unexplored  natural 
tracks),  for  receiving  notifications  about  a  fact  (e.g.  the  impacts  of  an  earth¬ 
quake),  for  travelling  (i.e.  which  travel  route  to  follow)  and  for  professional/ 
authoritative  decision-making  (e.g.  how  to  manage  a  physical  catastrophe  or 
a  crisis;  Craglia,  2007;  Wiemann  and  Bernard,  2014). 

Some  efforts  towards  VGI/SDI  integration  for  the  aforementioned  purposes 
have  already  occurred  (Craglia,  2007),  e.g.  the  Linked  Map  project,  which  links 
GI  from  different  sources,  in  particular  SDI  and  VGI,  through  the  paradigm  of 
Linked  Data  (Lopez-Pellicer  and  Barrera,  2014).  Linked  Data  connects  related 
data  through  Web  technologies.  The  Linked  Map  project  has  converted  gov¬ 
ernment  datasets  provided  by  the  Spanish  National  Geographic  Institute  to 
Linked  Data  into  Resource  Description  Framework  (RDF)  data,  so  that  these 
datasets  can  be  linked  to  VGI  sources  (OSM,  DBpedia,  etc.)  and  can  be  inte¬ 
grated  using  RDF  links.  RDF  is  a  standard  model  for  data  interchange  on  the 
Web;  RDF  links  enable  Linked  Data  browsers  and  crawlers  to  navigate  between 
data  sources  and  to  discover  additional  data.  Another  successful  example  is 
the  case  of  the  Ordnance  Survey,  which  has  linked  an  administrative  geog¬ 
raphy  dataset  to  other  datasets  on  the  Web,  demonstrating  the  advantages  of 
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explicitly  encoding  topological  relations  between  geographic  entities  over  tra¬ 
ditional  spatial  queries  (Goodwin  et  al.,  2008). 


4.2  Integration  with  Social  Media 

Both  active  and  passive  Social  Media  Geographic  Information  (SMGI)  can  be 
integrated  with  SDIs  in  a  GIS  environment  to  perform  qualitative  and  quanti¬ 
tative  spatial,  or  more  complex,  multidimensional,  analyses  (Jankowski  et  al., 
2010;  Bugs,  2014;  Campagna  et  al.,  2015;  Longley  and  Adnan,  2016).  In  par¬ 
ticular,  the  integration  of  INSPIRE  and  VGI  may  generate  a  higher  level  of 
knowledge  than  INSPIRE  alone,  especially  in  those  domains  where  the  social 
component  of  data  plays  a  relevant  role,  such  as  in  politics,  geo-marketing, 
tourism  or  spatial  planning.  The  INSPIRE  model  may  be  extended  through 
integration  with  SMGI,  where  multimedia  data  (i.e.  texts,  images,  videos  or 
audio)  and  user  evaluations  of  the  portrayed  objects  or  phenomena  are  given 
with  a  time-stamp,  enabling  various  kinds  of  new  analysis,  such  as  the  spatial, 
temporal  and  statistical  analysis  of  user  interests  and  preferences;  multimedia 
analyses;  behavioural  analyses;  or  combinations  of  these  analyses,  among  oth¬ 
ers.  Regarding  the  spatial  analysis  of  user  interests,  the  high  number  of  georef- 
erenced  posts  on  social  media  platforms  such  as  Twitter,  Instagram,  YouTube, 
Panoramio  and  Flickr  can  be  used  to  investigate  the  patterns  of  user  interests 
in  space  using  density  (Campagna,  2014)  and  clustering  functions  (Massa  and 
Campagna,  2014).  Data  from  such  platforms  can  be  accessed  through  APIs, 
georeferenced  and  saved  as  spatial  data  layers.  Using  SDI  services  such  as  WMSs 
or  WFSs,  GIS  software  can  easily  access  the  social  media  platform  through  the 
API,  enabling  the  seamless  integration  of  AGI  and  geo-UGC,  as  demonstrated 
by  Massa  and  Campagna  (2014).  The  overlay  of  spatial  data  layers  with  topo¬ 
graphic  SDIs  such  as  administrative  boundaries  may  offer  useful  hints  to  public 
authorities  in  understanding  not  only  which  places  are  important  to  the  com¬ 
munity  and  how  they  are  perceived  (Campagna,  2014),  but  also  the  composi¬ 
tion  of  a  community,  e.g.  local  people,  commuters,  tourists  or  others. 

Similarly,  the  temporal  reference  is  often  an  available  attribute  in  SMGI, 
which  enables  the  study  of  when  given  places  or  infrastructures  and  services 
are  used  at  different  points  in  time.  In  addition,  spatial  statistics  of  user  pref¬ 
erences,  i.e.  the  collecting  of  posts  by  location,  enables  planners  to  analyse 
patterns  in  user  interests  at  different  scales.  An  example  is  given  in  Floris  and 
Campagna  (2014),  where  hotspot  analysis  has  been  used  at  the  regional  level 
to  study  tourist  preferences  by  profile,  before  further  analysing  single  hotspots 
with  a  tool  embedded  in  ArcGIS  called  the  Spatio-Temporal  Textual  analysis 
(Spatext-STTx)  suite  and  with  geographically  weighted  regression  to  explore, 
at  the  local  level,  what  physical  and  locational  factors  may  affect  those  prefer¬ 
ences.  Furthermore,  multimedia  analysis  is  well  developed  in  the  case  of  text 
analytics.  However,  it  is  currently  more  difficult  to  automatically  extract  useful 


286  Mapping  and  the  Citizen  Sensor 


information  from  images,  video  or  audio.  In  the  case  of  text,  many  software 
packages  can  be  used  to  apply  simple  (i.e.  calculating  word  frequency,  or  tag 
clouds)  to  more  advanced  (e.g.  sentiment  analysis)  text  analysis  techniques. 
These  techniques  can  be  easily  applied  to  subsets  of  SMGI  obtained  by  spa¬ 
tial,  temporal  or  user  query.  Moreover,  user  behavioural  analysis,  i.e.  querying 
SMGI  by  a  user,  enables  the  study  of  user  behaviour  in  space  and  time.  This 
information  can  be  used  to  analyse,  for  example,  whether  a  public  space  is  vis¬ 
ited  by  local  people  or  by  outside  visitors.  This  information  may  also  be  useful 
for  profiling:  for  the  users  visiting  a  certain  place  or  service,  user  spatiotem- 
poral  footprints  can  be  defined  to  identify  people  who  mainly  move  locally, 
regionally  or  internationally,  and  where  they  come  from. 

An  additional  application  of  the  Spatext  (STTx)  suite  is  that  made  in  a  case 
study  for  the  cyclone  Cleopatra  in  Sardinia  (Italy)  to  extract  all  relevant  data  and 
information  (e.g.  perceptions,  opinions  and  needs  from  the  local  communi¬ 
ties)  from  social  media,  i.e.  Twitter,  YouTube,  Wikimapia  and  Instagram.  These 
data  were  then  integrated  with  the  latest  official  datasets  for  further  analysis 
and  relevant  action  by  decision-makers.  Another  related  web  application  called 
‘Place,  I  care’  was  employed  to  support  urban  and  regional  planning  processes. 
In  particular,  the  aim  was  to  collect  information  from  concerned  citizens  about 
the  physical,  environmental  and  socio-cultural  space  to  support  collaborative 
and  participatory  planning.  Although  they  have  not  been  verified  yet  through 
a  systematic  analysis,  there  have  been  several  case  studies  on  the  application  of 
STTx  in  the  same  areas  with  different  SMGI  sources,  where  different  types  of 
users  returned  similar  results,  suggesting  further  research  should  be  devoted  to 
better  understanding  the  issue  of  representativeness. 

The  above  novel  analytics  may  result  not  only  in  increasing  the  real-time 
monitoring  capability  of  geo-UGC  in  representing  the  state  of  territorial  sys¬ 
tems,  but  also  in  supporting  public  participation  and  dialogue  among  digitally- 
enabled  communities,  which  increasingly  represent  a  substantial  share  of  the 
total  population  in  most  countries.  Other  similar  examples  can  be  found  in 
several  domains.  For  example,  the  US  Geological  Survey  (USGS)  uses  social 
networking  to  collect  real-time,  earthquake-related  messages  and  early  infor¬ 
mation  to  accelerate  the  delivery  risk  and  response.  Other  related  initiatives 
aim  at  (spatial)  data  collection,  e.g.  Project  Noah2,  which  is  a  citizen  science 
web/mobile  tool  developed  to  explore  and  document  wildlife  around  the  globe. 
Similarly,  the  ZmapujTo.cz  mobile  application3  was  developed  in  2012  in  the 
context  of  an  ecological  project  to  combat  illegal  dumping  grounds  in  the 
Czech  Republic  and  contribute  to  solving  this  problem  with  the  involvement 
of  citizens  and  relevant  authorities.  At  the  time  of  creation,  there  was  only  a 
database  of  old  ecological  burdens,  which  covered  the  illegal  dumps  only  mar¬ 
ginally.  In  order  to  cover  the  largest  possible  area  and  utilise  the  potential  of 
crowdsourced  data,  a  platform  was  founded  for  information-gathering  from 
citizens.  The  modern,  efficient  and  widely- accepted  platform  was  chosen  for 
mapping  while  the  mobile  application  and  interactive  web  form  were  used  for 
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reporting.  More  than  2  500  illegal  dumps  were  reported,  and  more  than  40 
municipalities  and  towns  took  part  during  the  lifetime  of  the  first  version.  In 
March  2014,  the  second  version  of  ZmapujTo.cz  was  launched.  This  version 
introduced  several  new  features.  The  most  important  change  was  the  ability  to 
report  not  only  illegal  dumping,  but  also  a  variety  of  other  problems  that  one 
can  encounter  both  in  town  and  in  the  countryside.  The  entire  website  was 
redesigned,  including  an  interactive  map  for  efficient,  fast  and  intuitive  work. 
Further  to  the  aforementioned  applications,  many  other  initiatives  are  aimed  at 
supporting  pluralism  and  public  participation  in  decision-making,  such  as  in 
the  case  of  the  SoftGIS  approach  (Kahila  and  Kytta,  2009)  adopted  in  the  design 
of  the  Maptionnaire  web  platform  (Kahila-Tani  et  al.,  2016). 

While  early  experiences  in  SDI/VGI  integration  and  analyses  may  still  be 
limited  to  expert  research  laboratories  or  to  the  fortresses  of  the  social  media 
corporations,  institutional  initiatives  such  as  MYGEOSS  may  trigger  further 
development  in  this  domain.  MYGEOSS  is  an  ongoing  project  (2015-16)  of 
the  European  Commission  to  develop  smart  Internet  applications  based  on  the 
Global  Earth  Observation  System  of  Systems  (GEOSS)  to  inform  European  cit¬ 
izens  about  the  changes  affecting  their  local  environment.  Specifically,  within 
this  project,  a  number  of  interactive  apps  were  developed  that  reuse  official  spa¬ 
tial  data  to  offer  interactive  services  to  the  end  users.  For  example,  an  applica¬ 
tion  called  ‘Know  Your  City!’,  developed  by  UbikGS,  presents  social,  economic 
and  environmental  indicators  on  a  map-based  quiz.  Similarly,  ‘Loss  of  the 
Night’,  created  by  Interactive  Scape  GmBH  &  GFZ,  is  an  application  enabling 
citizen  scientists  all  over  the  world  to  collect  quantitative  information  on  the 
changing  nighttime  environment,  and  MYGEOSS  Phenology  App  Response 
was  produced  by  the  Friedrich-Schiller  University  to  support  vegetation  phe¬ 
nology  analysis  using  satellite  data  and  data  collected  by  citizens4. 

Despite  the  aforementioned  efforts,  Lopez-Pellicer  and  Barrera  (2014)  note 
that  the  integration  of  INSPIRE  with  VGI  has  not  gained  the  expected  atten¬ 
tion  yet,  and  this  especially  from  large  producers  of  GI,  because  of  the  techni¬ 
cal  disadvantages  of  the  current  Linked  Data  mechanism  (Schade  et  al.,  2010). 
Similarly,  Wiemann  and  Bernard  (2014)  state  that  this  integration  effort  is  in  its 
early  stages,  because  several  critical  issues,  which  have  been  discussed  earlier, 
need  to  be  considered.  Therefore,  it  seems  that  there  is  still  a  long  way  ahead 
for  a  full  integration  and  operation  of  a  global  GIS  platform,  which  is  a  concept 
set  out  in  the  next  section. 


5  Towards  a  Global  Integrated  GIS  Platform 

During  the  last  few  decades,  the  world  has  evolved  rapidly  because  of  the  con¬ 
tinuous  increase  in  the  urban  population,  new  needs,  modern  lifestyles  and 
technological  advancements,  creating  millions  of  individual  activities  with 
environmental,  economic  and  social  impacts  at  different  levels.  As  a  result,  sus- 
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tainability  at  various  levels  and  contexts  has  been  introduced  as  one  of  the  core 
aims  of  society,  sustainability  which  will  be  better  met  if  we  understand  the 
complexity  of  interactions  and  interrelations  between  the  parameters  involved. 
This  suggests  the  need  for  dynamic  information  systems  that  provide  reliable, 
accurate  and  real-time  data  to  support  intelligent  planning  and  management 
in  order  to  reach  optimum  decisions.  Visionary  and/or  applied  advanced  geo¬ 
spatial  tools  and  frameworks  that  move  in  this  direction,  such  as  the  GeoWeb 
(Dangermond,  2005),  Digital  Earth  (Craglia  et  al.,  2008)  and  VGEs  (Lin  et  al., 
2013),  have  been  proposed. 

The  GeoWeb  is  a  computer  network  providing  the  ability  to  integrate  and 
share  geospatial  information  locally  or  globally  via  the  Internet.  Through  the 
GeoWeb,  the  ideal  system  would  be  a  wide  network  of  distributed  GIS  ser¬ 
vices  constructed  and  implemented  by  various  inter-organisational  collabora¬ 
tive  agreements  so  that  individual  systems  and  communities  might  use  each 
others  services,  splitting  the  world  into  geographic  components  and  allowing 
the  dynamic  integration  of  knowledge.  The  communities  involved  may  range 
from  simple  users  to  governments,  business  enterprises  and  professionals 
focusing  on  improving  their  decision-making.  Gradually,  these  communities 
may  expand,  interoperate  more  and  become  increasingly  synergistic;  hence 
the  system  might  be  driven  by  the  thousands  to  millions  of  participants  cur¬ 
rently  using  websites  such  as  Google  Earth  and  OSM.  Eventually,  these  services 
could  provide  a  global  network  of  open-access  geographic  knowledge  about 
the  planet  and  online  applications  (open  access  and  licence-based)  for  pro¬ 
cessing  this  information  to  produce  the  outputs  for  decision-making.  These 
functionalities  may  support  a  whole  range  of  applications  and  purposes,  sup¬ 
porting  regional,  national  and  even  global  applications,  solving  issues  rang¬ 
ing  from  routine,  static  and  structured  problems  to  problems  that  are  complex 
and  unstructured  (including  those  demanding  real-time  responses)  and  that 
depend  on  cross-organisation  and  cross-discipline  collaboration.  Both  GIS 
professionals  and  citizens  sensors  have  a  role  in  this  system.  The  former  have 
the  skills,  knowledge  and  experience  of  authoritative  system  development  and 
operation,  while  the  latter  represent  the  ‘VGI-soldiers’  across  space  and  time 
who  voluntarily  collect  and  share  valuable  static  or  real-time  information  not 
available  to  SDIs  (Dangermond,  2015). 

Similarly,  the  vision  of  Digital  Earth  as  defined  by  Craglia  et  al.  (2008),  which 
refers  to  a  virtual  globe  system,  would  provide  access  to  vast  amounts  of  spati- 
otemporal  multi-geoinformation  for  various  levels  of  users  -  including  model¬ 
ling  tools  to  facilitate  decision-making.  Digital  Earth  has  eight  key  character¬ 
istics:  it  has  multiple  connected  globes/infrastructures  addressing  the  needs  of 
different  audiences;  it  is  problem-oriented,  i.e.  focused  on  various  key  appli¬ 
cation  themes  such  as  the  environment,  health  and  societal  issues;  it  enables 
space-temporal  search  in  real-time  from  both  sensors  and  humans;  it  allows 
spatial-based  queries  and  advanced  spatial  analysis;  it  provides  access  to  mod¬ 
els  as  well  as  to  ‘what  if’  scenarios  and  forecasts;  it  supports  the  visualisation  of 
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abstract  concepts  and  data  types  regarding  global  social  issues,  e.g.  low  income 
and  poor  health;  it  is  based  on  open  access  and  public  participation  across  mul¬ 
tiple  technological  platforms  and  media;  and  it  is  engaging,  to  enhance  interac¬ 
tive  and  exploratory  learning  for  multidisciplinary  education  and  science.  Five 
use  cases  that  would  comprise  the  vision  of  Digital  Earth  involving  a  unique 
platform  have  been  provided  by  Goodchild  (2012).  These  use  cases  involve 
Digitial  Earth  as  a  geoportal,  a  visualisation  service,  a  platform  for  simulation 
and  prediction,  a  source  of  unprecedented  spatial  and  temporal  resolution,  and 
a  technology  fully  integrated  into  human  activities. 

In  a  similar  vein,  VGEs  involve  a  new  generation  of  Web-based  virtual  geo¬ 
graphic  analysis  platforms  to  facilitate  the  advanced  exploration  of  physical, 
environmental,  socio-economic  and  other  phenomena  to  solve  related  prob¬ 
lems  at  a  deeper  level  by  combining  state-of-the-art  geotechnology  and  knowl¬ 
edge.  Such  a  VGE  system  would  consist  of  four  basic  components:  (i)  the  data 
component  for  the  integration,  organisation  and  management  of  geographic 
information;  (ii)  the  modelling  and  simulation  component  for  the  dynamic 
analysis  of  geographic  phenomena  by  providing  experts  from  various  disci¬ 
plines  with  an  open  access  platform  to  develop  and  disseminate  distributed 
advanced  models  in  an  easy  and  collaborative  way;  (iii)  the  interactive  compo¬ 
nent  between  the  system  and  users  that  includes  external  and  internal  data  col¬ 
lection  tools;  and  (iv)  the  collaborative  component  that  enables  group  decision¬ 
making  for  significant  societal  problems  through  public  participation  in  the 
processes  carried  out  by  experts. 

Although  the  concept  of  Digital  Earth,  the  existing  technology  of  the 
Geo  Web  and  the  use  cases  for  VGEs  have  a  common  aim  and  functions,  i.e.  to 
provide  advanced  geodata  hubs  and  sophisticated  spatial  analysis  tools  on  the 
Web,  they  have  some  differences  in  terms  of  their  focus.  In  particular,  Digital 
Earth  and  VGEs  involve  extended  capabilities  beyond  sharing  knowledge  and 
geoinformation  such  as  the  GeoWebs,  by  providing  advanced  virtual  reality, 
processing,  simulation  and  analysis  models  for  solving  a  wide  range  of  complex 
spatial  problems.  In  addition,  VGEs  involve  more  problem-oriented  geotech¬ 
nology  tools  that  inherently  have  some  of  the  features  of  planning  and  decision 
support  systems,  while  the  Digital  Earth  concept  aims  to  provide  more  abstract 
tools  for  investigating  the  spatial  interactions  of  certain  domains. 

Based  on  the  aforementioned  visions,  we  try  to  shift  from  a  conceptual  con¬ 
text  for  creating  a  new-generation  geographic  tool,  to  a  more  practical  and  tan¬ 
gible  framework  for  developing  a  global  integrated  GIS  platform,  as  illustrated 
in  Figure  4.  This  framework  extends  the  capabilities  of  a  typical  data  hub  and 
the  benefits  of  integration  of  SDIs  with  VGI.  In  particular,  the  system  consists 
of  three  main  components:  integrated  data  infrastructures,  integrated  online 
applications  and  a  system  for  providing  outputs  (both  static  and  dynamic)  that 
could  lead  to  decision-making  and  actions.  As  an  alternative  to  providing  wide 
access  to  a  single  source  of  data,  the  Integrated  Data  Infrastructures  component 
can  provide  distributed  data  mashups  by  integrating  vast  stores  of  information 
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(from  many  sources  in  the  public  and  private  sector  as  well  as  from  citizens) 
and  of  many  different  types  of  data,  along  with  geospatial  services  that  can 
interact  and  be  used  to  create  new  information.  The  data  sources  can  be  SDIs 
such  as  INSPIRE  or  NSDI  in  the  United  States,  VGI  platforms  created  through 
various  projects  (e.g.  OSM  and  Wikipedia),  social  media  (e.g.  Facebook  or 
Twitter)  and  other  media  such  as  emails,  mobile  phones,  Instant  messenger, 
etc.  Existing  services  can  be  combined  to  make  new  services,  and  Geocom¬ 
munities,  which  are  currently  fragmented,  may  be  consolidated  in  a  loosely 
coupled  environment  and  create  new  synergies  (Esri,  2006). 

The  integration  of  online  applications  could  provide  functionalities  from 
simple  publishing  and  mapping/visualisation  to  advanced  GeoComputation 
modelling  (Abrahart  and  See,  2014).  In  particular,  the  current  Web-GIS  ser¬ 
vices  can  be  extended  to  provide  not  only  easy  map  publishing  and  viewing 
through  VREs,  but  also  basic  GIS  functions,  such  as  querying,  buffering,  over¬ 
lays,  etc.,  through  Open  Access  (or  licence-based)  online  GIS  software.  In  addi¬ 
tion,  focused  GIS  applications,  in  the  form  of  different  thematic  modules  (i.e. 
for  planning,  transport,  the  environment,  etc.)  embedded  in  the  online  GIS, 
may  be  offered  through  distributed  geo-services  based  on  Web,  GIS  server- 
technology  and  service-oriented  architecture  (SOA)  that  is  open,  interoperable, 
and  dynamic,  based  on  common  data  and  service  standards  and  specifications. 
Using  the  SOA  model  with  GIS  services,  users  can  integrate  their  desktop  and 
departmental  solutions  into  implementations  that  connect  many  departments 
and  organisations  (Dangermond,  2008).  The  Web  Services  architecture  allows 
users  to  both  federate  their  distributed  systems  and  integrate  GIS  and  spatial 
processing  with  other  IT  business  systems,  such  as  Enterprise  resource  plan¬ 
ning  (ERP),  Customer  relationship  management  (CRM)  and  Supervisory  con¬ 
trol  and  data  acquisition  (SCAD A).  While  this  has  been  possible  for  some  time, 
the  advent  of  SOA  and  simple  technologies  to  integrate  these  services  has  made 
it  much  easier  and  promises  to  greatly  expand  the  GIS  market.  Ideally,  in  this 
context,  easy-to-build  ad-hoc  advanced  spatial  models  for  GeoComputation 
that  employ  artificial  intelligence  techniques,  for  example,  for  solving  compli¬ 
cated  problems  might  be  the  biggest  achievement  of  this  system. 

The  results  of  the  system  could  take  the  form  of  Dynamic  Outputs.  Outputs, 
which  result  from  the  processing  of  static  or  real-time  information,  can  have 
any  form,  i.e.  they  can  take  the  form  of  maps,  reports  and  messages,  and  mass 
notification  alerts.  In  particular,  maps  and  reports  in  text  or  tabular  form  are 
the  custom  outputs  of  a  GIS  and  can  be  used  by  users  for  decision-making  and 
appropriate  actions.  Messages,  e.g.  through  phone  calls,  emails,  SMS,  Viber  etc., 
refer  to  real-time  reporting  to  administrations  and  organisations.  Similarly, 
mass  notification  alerts  refer  to  broad  notifications,  or  alerts,  sent  to  people 
in  a  specific  geographic  region  in  emergency  or  crisis  management  situations. 
The  tremendous  high-speed  evolution  of  the  Web  and  Geospatial  technologies 
suggests  that  this  ‘super’  global  Geo-system  is  not  far  away. 
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Fig.  4:  The  framework  of  an  ideal  global  integrated  GIS  platform. 
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6  Conclusions 

The  integration  of  SDIs,  and  in  particular  INSPIRE,  with  VGI  may  potentially 
provide  considerable  benefits  for  all  stakeholders  involved,  i.e.  public  and  pri¬ 
vate  organisations,  professionals  and  citizens,  because  each  technology  may 
complement  the  other.  In  particular,  benefits  may  include  benefits  for  specific 
professional  groups  dealing  with  spatial  problems;  for  planning  and  decision¬ 
making;  and  for  the  wider  community,  which  may  enable  the  dissemination  and 
uptake  of  real-time  updated  information  regarding  daily  activities  (e.g.  traffic 
incidents)  or  emergency  situations,  physical  catastrophes  or  unknown  threats. 
Although  some  early  efforts  towards  this  integration  have  been  made,  this  pro¬ 
ject  is  not  an  easy  task,  since  several  technical  and  institutional  issues  need  to 
be  resolved,  as  discussed  earlier.  Ideally,  the  integration  could  be  extended  to 
creating  a  global  integrated  GIS  platform,  whose  general  framework  has  been 
presented  and  involves  similar  visions  and  concepts  to  Digital  Earth  and  VGEs. 
The  next  steps  should  be  focused  on  the  establishment  of  a  wider  network  of 
involved  stakeholders,  i.e.  academia,  industry,  public  authorities,  citizens  and 
NGOs,  in  the  context  of  a  well  defined  project  (e.g.  through  a  COST  Action)  to 
set  up  a  robust  framework  that  covers  all  of  the  aspects  of  the  project,  from  the 
initial  concept  to  its  implementation,  in  order  to  achieve  successful  examples  of 
integration  and,  ideally,  an  integrated  GIS  platform. 
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Abstract 

Despite  the  considerable  growth  in  Volunteered  Geographic  Information  ( VGI) 
activities  in  citizen  sensing  and  the  evident  opportunities  for  VGI  use  in  map 
revision  and  updating,  few  European  National  Mapping  Agencies  (NMAs)  or 
other  types  of  government  bodies  have  engaged  significantly  with  VGI.  Moreo¬ 
ver,  the  level  of  engagement  of  NMAs  with  the  VGI  community  varies  greatly, 
and  most  of  them  have  proposed  their  own  tools  for  encouraging  citizens  and 
public  partners  to  collect  feedback  or  new  data.  There  are  numerous  barriers 
limiting  the  participation  of  citizens  and  public  partners  in  NMA  data  collec¬ 
tion,  including  data  quality  issues,  the  motivation  of  the  contributors  and  legal 
issues.  The  aim  of  this  chapter  is  to  give  an  overview  of  the  experiences  of  some 
European  NMAs  in  engaging  with  VGI.  Guidelines  and  recommendations 
to  support  wider  engagement  with  the  VGI  community  are  also  proposed  to 
help  NMAs  and  interested  government  bodies  exploit  the  potential  of  VGI  for 
authoritative  mapping. 


Keywords 

VGI,  authoritative  mapping,  VGI  platform,  data  collection,  data  quality 


1  Introduction 

Volunteered  Geographic  Information  (VGI)  initiatives  have  seen  considerable 
growth  in  citizen  sensing  (Goodchild,  2007).  Different  terms  are  used  in  the 
literature  to  describe  this  volunteered  activity,  such  as  crowdsourcing  and  neo¬ 
geography  (Turner,  2006)  or  user  generated  spatial  content  (Antoniou  et  al, 
2009).  See  et  al.  (2016)  give  a  complete  review  on  the  current  terminologies 
used  and  the  distinctions  between  them.  In  this  chapter,  the  focus  is  on  VGI  in 
the  context  of  European  National  Mapping  Agencies  (NMAs). 

With  the  adoption  of  the  Open  Data  Policy1  which  encourages  to  freely 
release  data  that  can  be  used  and  republished  by  any  user,  many  government 
datasets  are  now  freely  available  to  the  public,  including  spatial  data  from 
some  European  NMAs  (Brovelli  et  al.,  2016).  Some  NMAs,  such  as  those  of 
Finland  and  the  Netherlands,  have  released  their  datasets  under  open  access 
licences;  these  authoritative  data  have  been  integrated  into  OpenStreetMap 
(OSM),  which  has  improved  the  OSM  database.  More  studies  are  necessary 
to  determine  if  this  integration  may  also  have  benefits  for  NMAs.  The  Open 
Data  Policy  can  be  an  opportunity  for  both  NMAs  and  geographic  data  end 
users.  Indeed,  releasing  data  under  open  access  licences  through  a  platform 
can  increase  the  usability  of  authoritative  data,  because  end  users  such  as  citi¬ 
zens  can  freely  download  and  use  data  for  different  purposes.  In  addition,  the 
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motivation  for  citizens  and  partners  to  contribute  by  adding  new  information, 
giving  feedback  and  providing  alerts  on  errors  and  updates  can  also  increase. 

Although  local  governments  had  already  started  during  the  last  ten  years  to 
use  VGI  as  a  participation  platform  to  engage  in  a  dialogue  with  citizens  rather 
than  as  a  way  to  simply  gain  or  share  information  (Johnson  and  Sieber,  2013), 
there  has  been  a  noticeable  change.  Indeed,  more  recently,  different  initiatives 
have  been  proposed  by  local  governments  to  collect  data  for  different  purposes 
(such  as  in  urban  planning,  in  order  to  advertise  new  regulations)  where  citi¬ 
zens  have  been  considered  both  as  sensors  and  as  potential  partners  (Karimi- 
pour  and  Azari,  2015;  Sedano,  2016). 

Traditionally,  almost  all  mapping  agencies  have  some  experience  in  collect¬ 
ing  information  from  their  data  users  by  receiving  alerts  regarding  mapping 
errors  or  updates.  However,  it  is  important  to  differentiate  between  passive 
processes  and  more  active  processes  in  which  the  mapping  agencies  actively 
engage  with  the  VGI  community  by  proposing  platforms  to  collect  and  dis¬ 
seminate  data  (See  et  al.,  2016). 

Olteanu-Raimond  et  al.  (2017)  have  recently  undertaken  a  detailed  review 
of  the  engagement  of  European  NMAs  with  VGI.  A  survey  was  undertaken  to 
elicit  experiences  with  VGI,  which  revealed  that  few  European  NMAs  are  cur¬ 
rently  engaged  with  VGI  and  that  those  have  developed  their  own  VGI  collec¬ 
tion  processes,  mostly  for  change  detection  and  the  reporting  of  alerts,  with  less 
frequent  examples  of  the  reporting  of  new  content,  vernacular  place  names  and 
photo  interpretation  (see  Figure  1).  In  most  cases  the  information  gathered  was 


Fig.  1  :  Use  of  VGI  by  European  NMAs.  Source:  Olteanu-Raimond  et  al.  (2017). 
All  rights  reserved  ©John  Wiley  &  Sons  Ltd. 
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on  traditional  features  included  in  standard  topographic  maps,  such  as  roads, 
buildings  and  names.  Very  few  mapping  agencies  have  harvested  and  used  the 
data  collected  by  OSM  or  GeoNames. 

The  low  involvement  of  European  NMAs  with  VGI  is  related  to  five  major 
barriers,  which  have  been  discussed  in  detail  in  Olteanu-Raimond  et  al.  (2017); 
these  are  issues  of  data  quality  and  validation;  legal  issues;  issues  related  to  the 
nature  and  motivation  of  the  crowd;  sustainability  issues;  and  employment  fears. 

This  chapter  further  develops  the  work  of  Olteanu-Raimond  et  al.  (2017)  by 
proposing  a  typical  VGI  collection  workflow,  which  was  considered  by  many 
NMAs  such  a  good  practice.  This  type  of  VGI  platform  is  based  on  the  main 
idea  of  a  volunteered  activity  where  contributors  contribute  directly  to  the  plat¬ 
form  by  adding  new  features  or  attributes,  correcting  existing  features,  etc.  It  is 
important  to  mention  that  the  integration  of  data  coming  from  other  crowd- 
sourced  activities,  such  as  GPS  traces  from  sports  activities,  are  out  of  the  scope 
of  this  chapter.  The  chapter  is  organised  as  follows:  Section  2  focuses  on  the 
experiences  of  European  NMAs  with  VGI  by  presenting  some  specific  exam¬ 
ples.  Section  3  presents  some  recommendations  for  NMAs  as  a  response  to 
some  of  the  five  major  barriers  identified  in  the  use  of  VGI.  Finally,  conclusions 
and  future  research  directions  are  outlined  in  Section  4. 


2  Experiences  with  VGI 

As  mentioned  previously,  most  of  the  NMAs  that  engage  with  VGI  have  devel¬ 
oped  their  own  tools  to  collect  data  from  citizens  or  from  public  partners.  The 
aim  of  this  section  is  to  present  an  overview  of  some  of  these  tools  that  com¬ 
pletes  and  provides  an  update  to  the  review  reported  in  Olteanu-Raimond  et  al. 
(2017),  which  describes  the  experiences  of  NMAs  in  Finland,  France,  Greece, 
the  UK,  the  Netherlands,  Portugal  and  Switzerland,  all  of  which  responded 
positively  to  our  call  to  contribute. 


2. 1  Change  Detection  and  Error  Alerts 

Change  detection  and  error  alerts  are  among  the  most  well  developed  VGI 
activities  proposed  by  NMAs.  Generally,  alerts  (e.g.  to  a  new  building  or  a  new 
road  name)  are  used  as  triggers  to  improve  the  quality  of  authoritative  data¬ 
bases.  The  following  outlines  the  experience  of  a  series  of  NMAs  in  using  VGI 
for  change  detection  and  error  alerting. 

At  IGN  France,  change  detection  is  generally  undertaken  by  land  surveyors 
who  analyse  a  range  of  alert  types  and  then  contact  local  governments.  Since 
2008,  IGN  France  has  developed  various  applications  that  aim  to  report  alerts 
concerning  errors,  change  detection  or  vernacular  toponyms  (Viglino,  2009). 
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These  applications,  deployed  on  different  platforms  and  via  different  technolo¬ 
gies  (e.g.  the  Web,  Android  mobile  phones  and  GIS)  are  mostly  community¬ 
sourcing  systems  where  professional  partners,  such  as  fire  services  and  post 
offices,  make  reports  on  IGN  data.  A  web  application,  accessible  through  the 
French  Geoportal,  was  also  developed  for  citizens,  allowing  them  to  make 
reports.  These  pioneering  applications  and  their  encouraging  results  have  led 
the  IGN  to  propose  a  unique  community  and  citizen  sourcing  portal2  ,  on 
which  citizens  can  complete  a  form  and  provide  location  information,  using 
GPS  tracks,  photographs  or  drawings,  on  an  IGN  basemap.  A  new  version  of 
the  application  is  being  tested  that  allows  partners  to  access,  add  and  modify 
features  in  an  up-to-date  copy  of  the  topographic  database.  Contributions  are 
first  checked  and  validated  by  the  surveyors  with  respect  to  data  specifications, 
and  quality  expectations  are  checked  by  using  quality  indicators,  visual  check¬ 
ing  and  comparison  with  different  data  sources  (e.g.  construction  permits 
issued  by  municipalities).  Depending  on  the  types  of  contributions,  the  VGI 
can  be  directly  integrated  into  authoritative  databases  or  used  as  a  trigger  for 
field  work  to  improve  the  geometric  precision  of  features. 

With  regard  to  future  engagement  with  VGI,  some  research  projects  are  cur¬ 
rently  under  consideration.  For  example,  Ivanovic  et  al.  (2016)  are  studying  the 
possibility  of  automatically  inferring  changes  from  additional  sources  found  on 
the  Web,  including  GPS  tracks  from  hiking  websites.  The  EU-funded  Horizon 
2020  LandSense  project  (2016-2020)  will  study  the  feasibility  of  updating  Land 
Use/Land  Cover  (LULC)  maps  using  Sentinel  and  in-situ  citizen-derived  data. 
Methods  to  aid  quality  assessment  and  conflict  management  in  order  to  vali¬ 
date  and  integrate  citizen-derived  data  into  the  authoritative  database  will  also 
be  explored  (Leibovici  et  al.,  2015). 

In  the  Netherlands,  Kadaster  is  running  successful  VGI  activities,  includ¬ 
ing  ‘terugmelding  BRT’  (alert  on  the  Dutch  Topographic  Registry)  and  ‘terug- 
melding  BGT’  (alert  on  the  ‘large  scale’  Topographic  Registry),  to  report  new 
changes  and  errors.  Kadaster  works  as  an  open  and  transparent  organisation, 
and  contributors  can  easily  see  what  has  been  done  with  their  alerts.  To  stimu¬ 
late  and  effectively  motivate  contributors,  the  staff  working  in  the  topographic 
department  promptly  validates  all  reported  alerts.  By  directly  updating  the 
topographic  maps  when  an  error  report  is  accepted,  Kadaster  shows  its  appre¬ 
ciation  to  the  contributors  and  stimulates  the  further  participation  of  citizens. 
In  addition  to  the  traditional  data-updating  by  means  of  aerial  and  panoramic 
photographs,  there  is  a  growing  tendency  to  use  thematic  data  from  external 
sources.  The  latter  sources  include  governmental  organisations,  companies  and 
also  citizen  contributors.  In  this  context,  Kadaster  has  proposed  a  second  pilot 
(also  known  as  the  Sonneveld  index)  to  collect  data  on  religious  buildings  such 
as  churches,  mosques,  synagogues,  temples,  monasteries  and  chapels;  more 
than  1,000  addresses  were  collected  by  a  group  of  enthusiastic  contributors.  As 
a  result,  Kadaster  was  able  to  enrich  its  topographic  maps. 
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Another  VGI  project  was  run  to  collect  information  on  national  border 
markers.  On  30  October  1980,  the  Netherlands  and  Germany  signed  an  agree¬ 
ment  about  the  maintenance  of  the  markers  that  define  the  borders  between 
them;  every  three  years,  the  national  border  markers  must  be  inspected  and, 
where  necessary,  maintained.  In  2012,  hikers  were  deployed  to  gather  informa¬ 
tion  about  the  situation  of  national  border  markers  by  using  an  ad  hoc  mobile 
application  that  also  allows  sending  a  picture.  As  a  result,  the  Kadaster  was  able 
to  make  a  decision  as  to  whether  it  had  to  maintain  a  particular  marker  or  not. 
The  border  markers  application  has  recently  completed  its  pilot  phase,  and  a 
continuation  of  the  project  is  being  developed. 

Finally,  the  forest  paths  project  was  a  recent  pilot  based  on  VGI  activities.  In 
the  Netherlands,  the  National  Dutch  Forest  Organization  (Staatsbosbeheer)  is 
responsible  for  data  on  forests.  The  aim  of  the  forest  paths  project  was  to  use 
VGI  to  update  the  organisations  datasets.  Kadaster  provided  raw  material  to 
forest  rangers  and  asked  them  to  verify  and  complete  the  map  based  on  their 
field  work.  Kadaster  has  successfully  completed  pilot  projects  in  Horsterwold 
and  Flevopolder.  The  local  forest  rangers  have  updated  their  digital  files  on  for¬ 
est  paths  in  their  region.  Kadaster  is  researching  how  to  implement  this  method 
in  the  rest  of  the  forested  area  in  the  Netherlands. 

Ordnance  Survey  (OS),  the  NMA  for  Great  Britain,  has  long  engaged  with 
customers  and  the  general  public  for  alerts  about  real-world  change  or  errors 
reported  in  its  paper  or  digital  map  products.  While  much  contact  is  directly 
via  telephone  or  written  correspondence,  a  web  map-based  tool  has  been  suc¬ 
cessfully  trialled  with  public  sector  customers  for  reporting  errors  or  omissions 
in  a  range  of  OS  products.  Using  the  ‘Tell  OS’  interface,  customers  can  locate, 
describe  and  submit  their  feedback  for  the  product  concerned.  Their  alerts  are 
acknowledged  and  the  information  is  fed  into  product  management  processes. 

Sharing  of  volunteered  information  is  also  enabled  for  route-based  informa¬ 
tion  through  the  OS  Maps  application.  Aimed  at  outdoor  activities,  the  applica¬ 
tion  enables  the  recording  and  sharing  with  other  users  of  route  information  as 
part  of  its  map  display,  search  and  navigation  functionality. 


2.2  New-Feature  Collection 

VGI  provides  the  potential  to  capture  new  features  or  new  information  regard¬ 
ing  existing  features  not  previously  collected  by  NMAs  as  it  might  not  be  within 
their  mission  priorities  or  it  may  be  excluded  for  political  or  economic  reasons. 

In  the  Netherlands,  Kadaster  is  running  pilot  projects  to  collect  new  fea¬ 
tures.  One  of  these  is  the  ‘Crowdsourcing  at  school!’  project,  which  is  part  of 
Kadaster’s  education  programme.  The  aim  of  this  initiative  is  to  allow  children 
to  become  familiar  with  VGI  and  with  advancing  society,  but  also  to  introduce 
them  to  the  Kadaster  organisation  and  its  products  and  services.  Children  get  a 
geographic  orientation  of  the  world  in  a  playful  way,  and  they  also  learn  about 
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their  position  within  society.  In  this  pilot,  children  collect  data  on  emergency 
services  such  as  police,  ambulance  and  fire  services.  This  project  can  also  be 
used  for  data  collection  for  other  organisations  or  public  services.  The  curricu¬ 
lum  for  this  project  is  in  a  pilot  phase  and  the  first  results  have  highlighted  that 
VGI  activities  are  not  only  for  adults. 

Linked  to  the  large-scale  renewal  of  Finland’s  National  Topographic  Data¬ 
base,  a  research  project  was  launched  by  National  Land  Survey  of  Finland 
(NLS)  at  the  beginning  of  2016  to  investigate  the  possibilities  that  VGI  can  offer 
in  authoritative  data  collection.  The  project  will  build  a  concept  to  define  the 
so-called  ‘Citizens  layer’  to  the  authoritative  topographic  data,  that  is,  a  plat¬ 
form  for  data  collection  where  they  will  be  able  to  import  or  draw  points,  lines 
and  polygons  representing  topographic  objects  in  the  real  world.  The  concept 
will  cover  principles  and  tools  for  VGI  data  collection,  e.g.  for  building  up  the 
service  and  the  user  interface  as  well  as  developing  protocols  and  methods  for 
engaging  with  citizens  (Mooney  et  al.,  2016).  The  quality  and  the  best  practices 
for  using  VGI  will  be  identified  in  a  pilot  phase.  The  project  seeks  to  validate 
data  quality  and  usability  and  to  investigate  the  possibilities  of  integrating  VGI 
collected  in  the  pilot  to  the  authoritative  database.  As  part  of  another  research 
project,  a  hyper-local  geosocial  networking  application  (hylo.mygeotrust.org) 
was  introduced  for  school  children  aged  14-  to  15-years-old.  With  the  mobile 
application,  pupils  were  asked  to  map  different  kinds  of  objects  in  their  neigh¬ 
bourhood  to  share  their  knowledge  and  observations.  The  initial  results  are 
encouraging.  Children  are  interested  in  their  local  environments  and  have  vol¬ 
unteered  to  map  and  share  their  knowledge  on  a  map  service.  Based  on  these 
experiences,  it  seems  beneficial  to  introduce  the  concept  of  a  ‘Citizen’s  layer’  in 
schools  as  well. 

Greek  mapping  authorities  have  been  using  VGI  as  a  starting  point  to  update 
or  create  new  mapping  outputs.  The  crowdsourced  data  are  treated  as  an  initial 
input  layer  that  is  compared  against  imagery  backdrops  (satellite  or  aerial).  The 
VGI  datasets  are  corrected,  completed  and  re-assigned  to  the  local  nomencla¬ 
ture  and  then  follow  the  normal  processes  for  internally  collected  data. 

Direijao  Geral  do  Territorio  (DGT)  is  the  NMA  in  Portugal;  it  coordinates 
Portugal’s  National  Spatial  Data  Infrastructure  (NSDI),  SNIG,  and  develops 
research  on  geographic  information.  Presently,  research  on  VGI  at  the  DGT 
is  focused  on  investigating  how  to  use  VGI  in  the  production  of  official  topo¬ 
graphic  data3.  The  general  idea  is  to  use  case  studies  to  demonstrate  the  poten¬ 
tial  benefits  of  including  VGI  as  part  of  the  authoritative  database  implementa¬ 
tion  strategy,  benefits  which  include  filling  gaps  in  official  data,  enlarging  the 
spatio-temporal  coverage  or  addressing  the  aims  of  specific  communities  of 
interest.  These  benefits  are  in  line  with  the  more  collaborative  and  participative 
approach  presently  adopted  for  SNIG  development..  To  identify  and  analyse 
the  integration  of  VGI  to  NSDI,  the  environment  and  planning  domain  will  be 
used  as  a  target.  Case  studies  will  be  designed  to  identify  required  modifica¬ 
tions  to  the  NSDI,  such  as  changes  to  the  metadata  catalogue  to  accommodate 
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VGI  types  or  the  interoperability  and  validation  requirements  for  incorporat¬ 
ing  VGI  within  the  NSDI.  Moreover,  a  prototype  based  on  a  web  service  may 
become  available  through  the  NSDI  geoportal,  allowing  any  registered  citizen 
to  edit  LULC  polygons  through  the  identification  of  geometry  and/or  classifi¬ 
cation  changes.  This  will  enable  the  analysis  of  thematic  and  positional  incon¬ 
sistencies  reported  by  the  users  and  define  a  strategy  for  including  VGI  in  the 
production  of  official  mapping. 

Looking  to  future  uses  of  VGI,  the  research  interests  of  most  of  European 
NMAs  range  from  motivational  factors  of  volunteer  engagement  in  VGI  to 
change  detection,  data  capture,  and  validation  and  management,  all  the  way 
through  to  data  or  service  delivery  and  associated  quality  and  trust.  In  addi¬ 
tion  to  VGI  involving  citizens,  community  groups  and  expert  groups,  explor¬ 
ing  how  VGI  approaches  might  draw  on  the  local  knowledge  of  internal  NMA 
employees  is  also  of  interest. 


2.3  Promoting  the  Usability  of  Authoritative  Data 

In  the  past,  within  a  research  context,  the  Centro  Nacional  de  Informa^ao 
Geografica  (CNIG),  which  was  then  integrated  in  the  Portuguese  NMA  (Insti¬ 
tute  Geografico  Portugues  (IGP),  presently  named  Dire^ao-Geral  do  Territorio 
(DGT)  has  been  involved  in  the  GEOCID  (Hipolito  et  al.,  2000)  and  Senses@ 
watch  (Gouveia  et  al.,  2004;  Gouveia  and  Fonseca,  2008)  projects,  which  rep¬ 
resented  early  attempts  to  promote  the  involvement  of  citizens  in  the  use  or 
production  of  geographic  information,  and  which  shared  some  of  the  issues 
associated  with  the  topic  of  VGI  and  its  integration  in  an  NSDI.  The  GEOCID 
project  aimed  to  promote  the  use  of  SDI  by  citizens  and  represented  a  first 
effort  to  target  citizens  as  users  of  these  infrastructures,  although  it  used  a  top- 
down  approach  (Fonseca  and  Gouveia,  2005).  Senses@watch  was  a  research 
project  centred  on  the  definition  and  evaluation  of  strategies  to  promote  the 
use  of  environmental  spatial  information,  such  as  water  quality  and  noise,  col¬ 
lected  through  citizens’  senses  (e.g.  vision,  hearing,  taste  and  smell).  A  proto¬ 
type  of  a  Web-based  collaborative  site  was  developed,  including  an  interface  for 
mobile  phones. 

The  results  of  these  initial  projects  were  successful,  with  a  considerable  level 
of  citizen  participation,  but  did  not  have  the  intended  follow-up  in  the  NMA 
services  and  workflows.  Nevertheless,  lessons  could  be  learned  from  these 
experiences  that  can  enrich  present  approaches  to  VGI.  These  projects  enabled 
the  confirmation  of  the  SDI  data  user’s  increasing  role  and  of  the  importance  of 
providing  participation  at  multiple  levels  where  VGI  can  be  seen  as  a  resource 
for  SDI.  The  assessment  of  the  pragmatic  implications  of  using  ICT  to  support 
citizen  participation  in  environmental  monitoring  or  the  identification  of  the 
major  benefits  of  involving  volunteer  contributors  (e.g.  the  promotion  of  pub¬ 
lic  awareness  on  environmental  issues;  the  cost-effectiveness  of  the  method  to 
maintain  data  collection  activities;  or  the  facilitation  of  the  creation  of  early 
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warning  systems),  as  well  as  the  corresponding  drawbacks  (e.g.  the  lack  of  data 
credibility),  are  just  some  of  the  insights  about  VGI  provided  by  these  activities. 

3  Recommendations  for  NMAs  regarding  VGI  Use 

Starting  with  good/best  practices  identified  in  NMA  experiences  and  research 
work,  the  goal  of  this  section  is  to  define  recommendations  for  NMAs  in  organ¬ 
ising  a  platform  to  collect  and  manage  VGI.  Compiling  a  list  of  expectations 
from  both  crowd  or  community  sourcing  and  NMAs  will  ensure  a  fruitful  rela¬ 
tionship  between  both  parties,  as  discussed  by  Olteanu-Raimond  et  al.  (2017). 
From  the  NMA  point  of  view,  issues  such  as  motivation,  stability,  consistency 
and  minimisation  of  false  entries  are  of  concern,  while  feedback,  the  citizen 
layer  and  transparency  for  the  crowd  and  community  sourcing,  among  others, 
are  some  of  the  crowds  concerns.  Here,  we  focus  on  six  elements  that  are  either 
barriers  to  the  use  of  VGI  or  key  elements  that  allow  for  the  construction  of  a 
successful  VGI  platform  for  citizens,  public  and  private  partners  and  govern¬ 
ments.  The  six  elements  are  as  follows:  the  data  model  and  objects;  the  inter¬ 
face;  motivation;  identification;  licensing;  and  quality  control. 
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Fig.  2:  A  typical  VGI  collection  workflow. 
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A  general  workflow  for  VGI  data  collection  is  illustrated  in  Figure  2,  where 
these  six  elements  are  marked  with  an  asterisk  (*).  In  Figure  2,  green  and  pink 
arrows  represent  NMA  and  contributor  tasks,  respectively. 

A  successful  platform  should  be  dedicated  to  both  contributors  and  users, 
and  should  engage  with  citizens,  specific  groups  of  citizens  sharing  the  same 
interest  (e.g.  hiking),  partners  (e.g.  governments,  emergency  services)  and 
the  education  community.  Contributions  should  be  made  via  user-friendly 
interfaces  that  implement  an  adaptive  data  model  as  proposed  by  NMAs,  via 
secured  identification,  and  via  easy-to-use  tools  to  contribute,  manage,  visu¬ 
alise  and  download  VGI  and/or  authoritative  data,  depending  on  each  NMAs 
data  licence.  A  real  added  value  from  NMAs  is  the  quality  control  of  volun¬ 
teered  data,  which  can  be  corrected,  validated  and  integrated  into  the  VGI  plat¬ 
form  (Q-VGI  to  VGI).  Depending  on  the  data  specification,  some  validated 
VGI  can  be  integrated  into  the  authoritative  data  (Q-VGI  to  NMA),  in  this 
way  improving  the  accuracy  and  quality  of  the  NMAs  data.  The  quality  control 
could  be  performed  by  contributors  in  a  continuous  way  through  the  sharing 
of  opinions  on  contributions,  and  step-by-step  by  the  NMAs. 

Table  1  summarises  the  recommendations  described  in  the  sections  that  fol¬ 
low  and  provides  a  list  of  opportunities  and  threats  that  can  arise  from  such 
an  NMA-VGI  collection  system.  Opportunities  and  threats  are  described  with 
respect  to  different  elements  identified  in  NMA  data  collection  systems. 


3.1  Data  Model  and  Objects 

Generally,  NMAs  are  in  charge  of  producing  topographic  databases  by  map¬ 
ping  the  topography  of  the  real  world  by  focusing  on  specific  types  of  objects 
described  by  few  thematic  features  (e.g.  number  of  lanes  of  a  road,  building  type; 
Olteanu-Raimond  et  al.,  2017).  This  implies  that  the  existing  features  can  be 
enriched  by  adding  thematic  information  (e.g.  number  of  floors  in  a  building), 
but  also  that  some  new  objects  can  be  added.  These  new  objects  may  be  feature 
classes  that  are  currently  lacking  in  quality  in  official  databases  due  to  frequent 
real-world  change  (e.g.  POIs,  shops  etc.),  or  data  that  can  be  most  efficiently 
collected  by  contributors  because  collection  is  not  feasible  with  remote  mapping 
(e.g.  hiking  trails  obscured  by  trees).  Data  that  are  of  special  interest  to  citizens 
or  public  services  such  as  emergency  services  and  municipalities,  e.g.  vernacular 
place  names  and  traditional  names  of  neighbourhoods  (Castellote  et  ah,  2013); 
obstacles,  to  help  the  navigation  of  people  with  disabilities  (Rice  et  al.,  2013); 
or  paths,  to  improve  pedestrian  maps  (Laakso  et  ah,  2011)  could  be  mapped 
by  citizens  having  local  knowledge,  as  suggested  by  Johnson  and  Sieber  (2013). 


3.1.1  Citizen  and  Partner  Layer 

Two  of  the  identified  barriers  in  using  VGI  are  data  quality  and  legal  aspects 
(Olteanu-Raimond  et  ah,  2017);  Johnson  and  Sieber  (2013)  have  reported  the 


Table  1:  Opportunities,  threats  and  recommendations  for  different  elements  that  comprise  an  NMA-VGI  data  collection  system. 
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same  finding  regarding  the  use  of  VGI  by  governments  and  argued  that  a  more 
formalised  VGI  collection  process  may  prove  beneficial.  A  solution  to  break 
down  these  barriers  can  be  a  participatory  citizen  and  partner  layer  proposed 
by  NMAs.  In  this  way,  the  NMAs  will  first  have  the  opportunity  to  add  new 
content,  but  also  to  increase  the  usability  of  traditional  topographic  data,  and, 
as  a  direct  consequence,  to  improve  the  accuracy  of  the  data  and  enrich  the  the¬ 
matic  information  (e.g.  ‘the  building  is  a  private  school  with  three  entrances’). 
Then,  NMAs  can  propose  a  formalised  framework  and  standards,  which  would 
be  expected  by  governments,  to  collect  VGI  with  a  focus  on  data  validation, 
data  quality  assessment  and  integration  methods,  allowing  the  topographic 
data  to  be  used  to  support  other  types  of  specific  data  (e.g.  water  pump  loca¬ 
tions  for  firefighters,  billboard  locations  for  local  municipalities).  We  would 
make  the  following  recommendations: 

•  Authoritative  basemaps  should  be  used  to  make  contributions  (e.g.  topo¬ 
graphic  data,  orthophotographs,  satellite  images,  DEMs); 

•  NMAs  should  assess  both  the  internal  quality  and  the  fitness-for-use  of  the 
volunteered  data  and  should  correct  errors  by  using  authoritative  data  as  a 
topographic  support;  this  could  be  a  big  added-value  that  would  encourage 
different  users,  such  as  public  authorities,  security  and  emergency  services, 
NGOs  and  citizens,  to  both  contribute  and  use  VGI; 

•  Volunteered  data  should  be  clearly  stamped  with  quality-control  stamps  to 
easily  distinguish  between  quality- controlled  contributions  and  those  not 
yet  quality-controlled; 

•  Campaigns  for  specific  data  collection  purposes  should  be  organised; 

•  User-friendly  tools  to  import  and  download  data  should  be  proposed  to 
allow  for  a  detailed  selection  of  suitable  data  regarding  spatiotemporal  and 
social  criteria  and  should  be  served  to  end  users  in  different  formats. 

As  mentioned  in  Section  2,  the  NLS  of  Finland  has  already  decided  to  experi¬ 
ment  with  this  new  citizen  layer  concept  through  a  research  project. 


3.1.2  Adaptive  Data  Models  for  Object  Collection 

NMAs  should  propose  an  adaptive  data  model  to  collect  and  monitor  geo¬ 
graphic  objects.  This  data  model  should  allow  contributors  to: 

•  report  updates  or  errors; 

•  add  new  attributes  and  objects  to  existing  class  objects; 

•  add  new  class  objects  identified  by  NMAs  as  out  of  the  specifications  of  the 
authoritative  topographic  databases  at  present,  but  important  for  different 
applications; 

•  add  new  class  objects  if  it  fits  end  user  needs; 
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•  assess,  as  a  mandatory  requirement,  the  membership  of  an  object  to  a  class 
object  type; 

•  import  data  collected  in-situ;  and 

•  ensure  interoperability  with  the  existing  topographic  data  model  in  order  to 
integrate  significant  and  validated  contributions  directly  into  authoritative 
databases  or  to  generate  updates. 

Nevertheless,  proposing  a  quite  open  citizen  and  partner  layer  may  introduce 
some  threats,  such  as  the  possibility  of  obtaining  large  volumes  of  themati¬ 
cally  heterogeneous  data  or  data  that  are  characterised  by  spatial  and  thematic 
incompleteness. 


3.1.3  Protocols  for  Object  Collection 

The  lack  of  protocols  and  the  potential  problems  that  this  may  entail,  as  well  as 
recommendations  for  data  collection,  are  discussed  in  more  detail  in  Chapter 
10  (Minghini  et  al.,  2017).  However,  our  additional  recommendations  regard¬ 
ing  protocols  are  as  follows: 

•  define  a  protocol  for  mapping  different  types  of  data  for  existing  or  new 
objects  that  balances  the  need  to  collect  a  minimum  set  of  attributes  and 
metadata  with  the  desire  for  completeness  in  the  data  collection  process; 

•  update  and  enrich  the  protocol  regularly  by  taking  into  account  end  user 
experiences; 

•  propose  a  forum  and  online  help  facility  to  share  experiences  with  and 
between  contributors  and  assist  contributors  when  needed;  discussion 
forums  have  been  proven  to  contribute,  in  some  cases,  to  the  creation  of 
more  reliable  data  (Haklay,  2010;  Perger  et  al.,  2012),  and  they  are  also  a 
valuable  tool  for  community  building. 


3.1.4  Instant  Feedback  to  Contributors 

From  the  different  NMA  experiences  outlined  above,  it  has  been  shown  that 
engaging  with  contributors  using  transparent  communication  is  crucial  for  a 
successful  and  sustainable  platform.  Good  communication  can  be  ensured  by: 

•  one  uniform  feedback  platform  for  the  different  products  offered  by  the 
mapping  agency:  topographic  data,  citizen  and  partner  layer,  updates  and 
error  updates,  etc.; 

•  contributor  involvement  in  the  feedback  system  building  process; 

•  e-mails  with  updates  concerning  the  status  of  contributions;  and 

•  the  display  of  contributor  data  immediately  or  in  near  real-time. 
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3.2  Interface 

Two  kinds  of  user  interface  tools  can  be  distinguished  (Sabou  et  al.,  2014): 
acquisition  interfaces  designed  for  and  used  by  contributors  to  carry  out 
crowdsourcing  tasks,  and  management  interfaces,  which  are  required  by  the 
managers  of  the  VGI  project  to  monitor  progress,  assess  quality  and  manage 
contributors.  In  this  section  we  focus  on  the  acquisition  interface  used  by  con¬ 
tributors  for  data  collection,  designated  here  as  the  contributor  interface. 

Switzerland’s  geoportal4  was  recently  awarded  the  ‘2015  eGovernment  spe¬ 
cial  prize’5  at  the  ninth  national  eGovernment  Symposium,  which  was  held 
on  24  November  2015  in  Bern  involving  representatives  from  the  worlds  of 
business,  administration,  politics  and  academia.  Consistent  use  of  open  source 
software,  open  standards  and  cloud  computing  were  the  reasons  for  winning 
the  prize.  This  geoportal  features  many  properties  that  an  NMA  contributor 
interface  should  provide  (e.g.  an  intuitive  and  contributor-friendly  interface, 
a  VGI  component  with  the  recently  renewed  revision  service  with  immediate 
customer  feedback,  and  a  smooth,  dynamic  and  interactive  map  navigation)  a 
report  option  for  customer  alerts  and  the  use  of  open  standards.  Our  recom¬ 
mendations  for  the  contributor-interface  are  as  follows: 

•  offer  a  contributor-friendly  interface  that  guides  the  contributor  to  supply 
all  the  information  required  by  the  protocols  (e.g.  metadata,  attributes); 

•  define  contributor-friendly  interfaces  that  incorporate  the  NMA  protocols 
and  best  practices  without  negatively  impacting  upon  contributor  enthusi¬ 
asm  or  hampering  the  flow  of  data; 

•  provide  tools  to  support  the  training  of  contributors  and/or  groups  of  con¬ 
tributors  according  to  the  goal; 

•  through  the  interfaces  for  collaboration,  address  the  need  to  educate  con¬ 
tributors  and  easily  integrate  the  non-traditional  types  of  data  that  they 
might  collect; 

•  whenever  possible,  make  use  of  standard,  interoperable  data  formats  and 
services  in  order  to  further  extend  the  interface  and/or  integrate  new  ser¬ 
vices  and  applications; 

•  use  up-to-date  basemaps  and  do  not  overload  the  platform  with  multiple 
themes; 

•  adequately  accommodate  all  the  VGI  types,  which  are  of  a  diverse  nature, 
dynamic,  and  sometimes  produced  in  real-time; 

•  implement  the  full  editing  of  objects  rather  than  just  hints  attached  loosely 
to  existing  data  objects. 

Additionally  to  what  has  been  mentioned  above,  an  intuitive  contributor-inter¬ 
face  can  also  play  another  very  important  role  in  the  field  of  input  data  quality. 
While  the  basic  principles  of  human-computer  interaction  (HCI)  should  be 
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intact  and  meticulously  followed,  the  contributor-interface  could  be  the  vehicle 
for  implementing  a  number  of  elements  in  the  protocol  regarding  the  input  of 
high-quality  data  (more  information  protocols  for  data  capture  can  be  found  in 
Chapter  10  by  Minghini  et  ah,  2017).  It  is  common  for  NMAs  to  have  protocols 
in  place  that  must  be  followed  in  order  to  achieve  maximum  homogeneity  in 
the  datasets  produced.  Volunteered  content  should  also  follow  similar  rules. 
Thus,  the  contributor- interface,  which  in  this  case  serves  as  the  data  capturing 
layer,  should  be  equipped  with  as  many  protocol  elements  as  possible,  balanc¬ 
ing  between  high  data  integrity  (and  thus  quality)  and  adequate  freedom  for 
the  contributor. 


3.3  Motivation 

An  important  part  in  the  success  of  using  VGI  is  engaging  people.  Interested 
readers  will  find  a  detailed  discussion  on  user  motivation  and  engagement  in 
Chapter  5  (Fritz  et  al.,  2017).  NMA  experiences  have  shown  that  citizens  are 
often  not  really  interested  in  getting  paid  or  in  being  presented  with  awards  or 
prizes:  having  the  possibility  to  contribute  geographic  information  from  their 
personal  surroundings  with  a  direct  impact  on  publicly  visible  maps  and  get¬ 
ting  feedback  from  NMAs  are  the  main  positive  reasons  to  contribute.  Nev¬ 
ertheless,  in  order  to  increase  the  number  of  contributors  and  ensure  sustain¬ 
ability,  NMAs  should  first  promote,  advertise  and  permeate  the  crowds,  and 
secondly  motivate,  activate  and  reward  contributions.  However,  when  imple¬ 
menting  reward  systems,  these  rewards  should  not  encourage  contributors  to 
favour  quantity  over  quality  in  their  contributions. 


3.3.1  Gamification  Techniques 

Undoubtedly,  a  contributor- interface  that  enhances  contributor  experience 
can  help  to  engage  contributors;  however,  this  factor  alone  is  not  enough  to 
create  the  drivers  and  support  the  motivation  that  need  to  be  achieved  to 
attract  a  large  pool  of  contributors  to  an  initiative.  There  are  a  number  of 
research  efforts  around  the  use  of  gamification  techniques  (Antoniou  and 
Schlieder,  2014;  Yanenko  and  Schlieder,  2014)  to  achieve  these  levels  of 
motivation.  Gamification,  loosely  defined,  is  the  implementation  of  gam¬ 
ing  practices  in  a  non-game  context.  In  essence,  gamification,  through  game 
mechanics  and  game  design,  can  have  an  impact  and  influence  on  participant 
behaviour.  The  aim  of  gamification  is  to  make  the  participant  achieve  certain 
goals  by  enhancing  engagement,  improving  performance  and  multiplying 
participation  efforts  towards  a  goal.  Thus,  NMAs  can  considerably  enhance 
citizen  motivation  by  implementing  gamification  processes  for  data  captur¬ 
ing  or  change  detection. 
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3.3.2  Giving  Feedback 

Feedback  to  contributors,  given  by  sending  updates  concerning  the  status  of 
their  individual  or  group  contributions,  is  an  important  motivation  for  con¬ 
tributors.  Organisations  need  to  assess  the  likelihood  of  such  motivations  being 
strong  enough  in  a  prospective  contributor  community  to  ensure  the  sustain¬ 
ability  of  their  proposed  VGI  initiative  (Hickling  Arthurs  Low  Corporation, 
2012).  To  help  sustain  contributions  over  time,  some  recommendations  are 
listed  here: 

•  all  contributions  should  be  welcome  (e.g.  attributes  such  as  ‘gravel  road  now 
paved’  can  be  as  valuable  as  topographic  data); 

•  contributors  want  to  receive  acknowledgement  for  their  contributions  and 
to  get  rapid  evidence  that  these  contributions  have  been  used; 

•  the  process  of  making  contributions  should  be  as  easy  and  streamlined  as 
possible,  as  contributors  may  not  be  strongly  motivated  to  contribute  to 
extensive  feature  classification  and  the  metadata  requirements  of  public 
mapping  programs;  and 

•  different  contributor- interfaces  maybe  required  for  first-time  or  occasional 
contributors  than  for  internal  production  staff  or  external  power  contrib¬ 
utors;  for  example,  tools  that  allow  inappropriate  content  to  be  reported 
through  a  link  allow  contributors  some  control  over  data  quality 

•  (Coleman,  2010;  Esri,  2010;  Hickling  Arthurs  Low  Corporation,  2012). 


3.3.3  Engage  with  Groups  of  Users 

A  number  of  advertising  activities  can  be  used  to  attract  contributors  to  a  VGI 
project.  In  a  study  on  the  impact  of  contributors  to  VGI  projects  (Schmidt  et 
al.,  2012),  it  is  proposed  to  attract  diverse  groups  of  contributors  with  project- 
related  mapping,  to  make  mapping  easy  for  beginners  and  to  keep  contributors 
mapping  with  social  mapping  events,  as  typically  happens  with  OSM  (Mooney 
et  al.,  2015).  Launching  campaigns  will  attract  a  number  of  users  for  a  time 
period,  whereas  connecting  relevant  user  groups  (e.g.  land  owners  having  an 
interest  in  maintaining  boundaries)  will  create  more  devoted  contributors.  In 
general,  people  who  use  the  data  will  feel  more  attached  to  the  project  and  will 
be  more  willing  to  contribute.  In  addition,  it  will  be  easier  for  them  to  find  pos¬ 
sible  errors  and  report/make  corrections  if  the  procedures  are  made  as  easy  as 
possible.  Campaigns  should  target  groups  such  as  landowners,  school  children, 
cyclists,  joggers,  scouts,  orienteering  enthusiasts,  hunters,  hikers  and  geocach- 
ers,  among  others,  who  may  be  more  willing  to  contribute  due  to  their  special 
interests  and  because  they  will  take  personal  advantage  of  the  addition  of  the 
VGI  to  the  NMA  database. 
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3.3.4  Engage  with  Public  Partners 

Some  strategic  public  partners  may  be  very  important  for  the  collection  of  cer¬ 
tain  types  of  data.  For  example,  municipalities  can  easily  engage  with  citizens 
for  urban  planning  purposes  and  security-related  partners  that  manage  emer¬ 
gencies,  such  as  civil  protection  authorities  or  firefighters,  who  are  very  often 
in  the  field  and  have  specific  needs  such  as  fire  hydrants,  obstacles,  building 
entrances,  etc. 


3.3.5  Engage  with  Schools 

Introducing  the  work  of  the  NMA  and  the  idea  and  principles  of  topographic 
data  collection  to  school  pupils  may  be  a  good  way  to  disseminate  knowledge 
and  could  shape  a  large  number  of  future  contributors.  To  put  this  idea  into 
practice,  the  following  recommendations  can  be  given: 

•  the  collection  of  data  must  be  integrated  within  an  education  programme 
(i.e.  teaching  by  collecting  data); 

•  close  cooperation  with  teachers  is  crucial:  teachers  are  busy  with  their 
everyday  work,  so  they  need  to  have  some  ready,  easy-to-adapt  teaching 
materials; 

•  school  pupils  are  a  very  motivated  group,  but  an  application  for  data  collec¬ 
tion  by  pupils  must  work  perfectly  and  rapidly; 

•  before  data  collection  starts,  it  is  important  to  explain  to  the  pupils  what 
crowdsourcing  is  and  how  it  works;  pupils  need  to  understand  that  they 
are  an  important  part  of  society  and  that  they  can  deliver  valuable  data  for 
others; 

•  data  collection  projects  for  pupils  need  to  have  some  fun  aspects  that  reward 
pupils  who  deliver  good-quality  data,  which  can  even  further  support  their 
motivation  -  gamification  is  one  possible  approach  that  may  fit  well  with 
the  needs  of  this  particular  group;  and 

•  different  stakeholder  groups  (pupils,  parents,  teachers,  etc.)  should  be 
invited  to  refine/improve  the  curriculum. 

Although  unrelated  to  VGI  data  collection  for  NMAs,  two  successful  examples 
of  the  engagement  of  pupils  that  have  taken  the  above  recommendations  into 
account  are  described  in  Brovelli  et  al.  (2016)  and  Ebrahim  et  al.  (2016). 


3.4  Registration 

Data  contributors  may  be  anonymous,  but  this  may  permit  vandalism  (e.g. 
mapping  fake  features  or  deleting  features  that  exist)  and  the  contribution  of 
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fraudulent  data  or  spam.  It  is  still  not  entirely  possible  to  distinguish  between  a 
credible  VGI  contributor  on  the  one  hand  and  an  incompetent  one,  a  mischief- 
maker  or  an  outright  vandal  on  the  other  hand  (Coleman,  2010),  although 
research  is  ongoing  in  this  area:  for  example,  Ciepluch  et  al.  (2010)  have  stud¬ 
ied  the  history  and  the  profiling  of  contributors;  Van  Exel  et  al.  (2010)  have 
proposed  the  experience,  recognition  and  local  knowledge  of  the  individual 
as  an  indicator  of  quality  input;  and  D’Antonio  et  al.  (2014)  have  proposed  an 
evaluation  model  for  the  contributor’s  reputation  and  data  trustworthiness. 
However,  based  on  the  NMAs’  experiences,  very  few  bad  contributions  have 
been  spotted,  and  in  general  more  than  80-90%  of  the  citizen  contributions  are 
useful  (Olteanu-Raimond  et  al.,  2017).  Registered  contributors  are  expected 
to  have  a  more  consistent  contribution,  since  participating  in  the  registration 
process  proves  their  motivation  and  their  intention  to  be  identified.  Apart  from 
the  contributor  identification,  registration  has  additional  advantages  (e.g.  the 
contribution  can  be  saved  and  finalised  later  by,  for  instance,  tagging  the  posi¬ 
tion  in  the  field  and  submitting  the  contribution  later  by  using  a  computer  at 
home).  Three  different  types  of  profiles  (see  Table  2)  could  be  made  available 
when  contributors  register  depending  on  the  type  of  organisational  model  for 
data  collection  used  and  the  validation  process  applied  by  the  NMA  afterwards; 
these  include: 

•  Strong  registration:  this  may  create  more  powerful  contributors  in  terms  of 
permitted  activities  with  the  data.  Full  identity  should  be  required  in  sensi¬ 
tive  cases  such  as  cadastral  information  for  property  owners.  Another  cat¬ 
egory  may  be  a  contract  contributor  (other  authorities,  for  example)  with 
specific  permissions  but  also  contribution  obligations. 

•  Light  registration:  this  type  of  identification  allows  the  organisation  collect¬ 
ing  the  data  to  contact  the  contributor  if  needed  and  to  learn  more  about 
contributors,  e.g.  to  determine  potentially  useful  information  such  as  their 
field  of  expertise. 

•  Weak  registration:  this  only  requires  a  valid  email  and  password  for  regis¬ 
tration  to  create  a  user  account. 


Table  2:  Types  of  contributor  registration  profiles. 


Strong  registration 

Light  registration 

Weak  registration 

Full  identity,  e.g.  from  e-government 
authentication  systems,  such  as: 
full  name,  full  address  or  postcode, 
profession  and  institution,  phone 
number,  passport/ID  number 

Valid  email 
Profession 

Age  /  age  group 
Gender 

Valid  email 
Password 
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3.5  Quality  Control 

In  Chapter  7  (Fonte  et  al.,  2017),  an  overview  of  the  quality  indicators  that  can 
be  used  to  assess  VGI  is  presented.  Traditional  spatial  data  quality  assessment 
measures  can  be  used.  These  can  be  applied  using  reference  data,  such  as  con¬ 
trol  data  provided  by  experts,  or  through  the  comparison  of  data  coming  from 
several  sources,  which  may  even  be  VGI,  enabling  the  assessment  of  logical 
consistency.  Additionally,  other  indicators  can  be  used  to  assess  the  reliabil¬ 
ity  of  the  data,  such  as  metadata  on  the  data  acquisition  procedure,  indicators 
about  the  contributor,  socio-economic  indicators  or  the  consistency  of  corre¬ 
sponding  data  with  different  origins. 

The  quality  control  could  be  carried  out  at  different  levels  that  aim  to  facili¬ 
tate  the  final  validation  by  the  NMA  (which  is  mandatory),  as  outlined  in  the 
following  subsections. 


3.5.1  Level  0:  Real-time  Control  Procedures 

This  initial  level  of  quality  control  ensures  that  the  minimum  required  infor¬ 
mation  specified  in  the  data  collection  protocol  is  provided  and  that  no  incon¬ 
sistencies  are  introduced.  It  aims  to  assist  the  contributor  in  mapping  valid 
information  and  is  performed  during  the  collection  phase.  Note  that  the 
absence  of  inconsistencies  does  not  imply  that  the  data  are  accurate  and  reli¬ 
able.  It  controls: 

•  Required  metadata  and  attributes.  If  the  minimum  information  required 
by  the  protocols  is  not  provided,  alerts  to  the  contributors  asking  for 
additional  information  should  be  sent.  The  submission  of  a  new  contri¬ 
bution  can  be  approved  once  this  control  check  is  successful,  i.e.  the  con¬ 
tributor  can  then  submit  the  data.  Care  should  be  taken  to  minimise  the 
mandatory  information  needed  to  avoid  negatively  impacting  contribu¬ 
tor  motivation. 

•  Logical  consistency.  Automatic  rules  implementing  some  basic  topological 
consistency  checks  (e.g.  a  polygon  must  be  closed,  roads  should  be  topo¬ 
logically  related,  etc.)  should  be  applied  in  real-time  when  a  contribution 
is  submitted.  The  system  should  recognise  the  presence  of  some  types  of 
inconsistencies  in  contributions  and  then  not  allow  the  submission  of  these 
contributions,  such  as  in  the  case  of  clear  topological  mistakes;  in  other 
cases,  the  system  should  not  prohibit  submission  but  generate  warning 
messages  to  the  contributors  suggesting  corrections  or  additional  checks,  as 
contributors  may  in  fact  be  providing  important  information  about  signifi¬ 
cant  changes  in  the  terrain. 
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3.5.2  Level  1:  Applying  Automatic  Quality  Control  Methods  to  the 
Volunteered  Data 

The  goal  of  Level  1  checks  concerns  data  quality  assessment  through  applying 
automatic  methods.  Three  approaches  are  recommended: 

•  Data  reliability.  Automatic  procedures  can  be  used  to  perform  an  ini¬ 
tial  check  of  data  reliability.  Several  sources  of  VGI  can  be  used  to  assess 
data  agreement:  this  refers  to  comparison  of  corresponding  positional  and 
attribute  information,  such  as  the  position  of  roads  in  different  data  sources. 
The  logical  consistency  of  contributions  can  also  be  used  to  assess  their  reli¬ 
ability;  for  example,  if  a  building  is  positioned  inside  a  lake,  the  information 
may  be  considered  to  have  low  reliability. 

•  Contributor-based  data  reliability.  If  a  prior  assessment  of  contributor  reli¬ 
ability  is  performed,  for  example  by  maintaining  historical  data  on  the  con¬ 
tributors,  it  is  possible  to  associate  a  degree  of  reliability  to  the  data  that  is 
related  to  the  reliability  of  the  contributor. 

•  Specification-based  reliability.  Reliability  of  contributions  can  also  be 
assessed  by  considering  NMA  specifications  associated  with  the  object 
being  contributed.  For  example,  if  a  building  with  an  area  lower  than  the 
NMA  specification  for  the  minimum  size  of  buildings  is  mapped,  this  con¬ 
tribution  can  be  automatically  tagged  as  not  fit-for-purpose. 


3.5.3  Level  2:  Crowdsourcing  Revision 
Crowdsourcing  revision  consists  of: 

•  In-situ  campaigns.  Mapping  agencies  can  organise  in-situ  campaigns  asking 
contributors  to  assist  in  the  validation  of  some  highlighted  complex  cases 
where  the  NMA  has  insufficient  information  to  perform  a  final  validation. 
An  example  of  this  can  be  the  assignment  of  a  land  cover  class  to  particular 
locations  when  no  field  visits  have  been  made. 

•  Peer  validation  between  contributors.  The  VGI  platform  should  provide 
additional  capabilities  to  enable  contributors  to  vote  on  (‘thumbs  up’  or 
‘down’)  or  to  comment  and  discuss  contributions.  Discussions  and/or  com¬ 
ments  can  generate  new  insights  into  the  main  difficulties,  and  eventually 
some  reliability  indicators  can  be  associated  with  specific  types  of  contribu¬ 
tions  or  to  some  areas,  e.g.  the  classification  of  certain  land  cover  types  may 
be  difficult  for  contributors,  or  contributions  from  a  particular  area  can  be 
found  to  have  different  interpretations.  These  indicators  can  be  useful  for 
assessing  data  reliability. 
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3.5.4  Level  3:  Final  Validation  with  Respect  to  a  Typology  of  Quality  Assurance 

Methods  to  visualise  quality,  discussed  in  Chapter  9  (Skopeliti  et  al.,  2017),  can 
considerably  enhance  this  step  in  the  process.  Some  recommendations  include: 

•  Define  a  typology  of  quality  and  associate  the  indicators  previously  assessed 
to  either  qualitative  or  quantitative  rankings;  these  rankings  can  be  based 
on  probabilistic,  fuzzy  or  possibilistic  approaches. 

•  To  optimise  the  use  of  resources  and  procedures,  NMAs  may  perform  vali¬ 
dation  only  on  volunteered  data  that  are  considered  to  be  worth  validating 
depending  on  the  indicators  obtained  in  the  previous  step  that  are  consid¬ 
ered  relevant  for  each  dataset. 

•  Final  decisions  are  taken  on  the  quality  of  the  contributions  and  their  use¬ 
fulness,  depending  on  the  quality  values  obtained  with  the  adopted  quality 
typology. 

•  As  the  final  validation  assesses  how  good  the  contributor  input  was,  this 
information  may  be  used  to  rank  contributor  performance. 


3.6  Licensing 

With  VGI,  an  important  issue  arises  regarding  the  intellectual  property  of  the 
data,  which  should  be  handled  through  licensing  and  consent.  Contributors 
should  give  the  NMA  full  rights  to  the  data  so  that  the  NMA  can  take  full  advan¬ 
tage  of  the  contributed  data;  this  consent  can  be  obtained  either  during  the  reg¬ 
istration  phase  or  after  the  first  contribution  is  made.  The  contributors  should 
be  informed  that  by  contributing,  they  are  providing  geographic  data  to  the  offi¬ 
cial  national  basemap.  A  well  defined  licence  for  the  NMAs  sharing  and  use  of 
geographic  data  should  be  provided  to  and  agreed  upon  by  the  contributors. 

Some  other  legal  aspects,  such  as  liability  and  privacy,  can  differ  from  coun¬ 
try  to  country  or  from  product  to  product.  These  aspects  are  discussed  in  more 
detail  in  Chapter  6  (Mooney  et  al.,  2017). 

4  Conclusion 

In  this  chapter,  a  review  of  the  different  VGI  experiences  of  a  few  European 
NMAs  was  presented,  and  guidelines  and  recommendations  were  presented  to 
help  mapping  agencies  better  exploit  the  opportunities  offered  by  VGI  through 
volunteered  activities  made  by  contributors. 

Due  to  its  nature  and  characteristics,  VGI  is  still  seen  by  NMAs,  and  more 
generally  by  government  bodies,  as  having  low  quality  and  as  a  source  of  unreli¬ 
able  data.  Therefore,  few  NMAs  are  engaged  with  VGI.  When  they  are  engaged, 
they  have  generally  proposed  their  own  tools  to  collect  reports,  and  only  rarely 


VGI  in  National  Mapping  Agencies:  Experiences  and  Recommendations  321 


has  VGI  been  used  to  collect  data  on  features  beyond  the  standard  set  mapped 
by  NMAs. 

Even  though  this  type  of  data  needs  the  development  of  new  and  different  pro¬ 
cedures  for  collection  (see  Chapter  10  by  Minghini  et  al.,  2017)  or  quality  assess¬ 
ment  (see  Chapter  7  by  Fonte  et  al,  2017)  to  become  of  major  interest,  VGI  is 
nevertheless  a  valuable  source  of  data,  as  it  may  help  NMAs  to  provide  data  that 
are  more  up-to-date  as  well  as  to  collect  new,  additional  data  that  better  address 
user  needs.  New  features  usually  not  collected  by  NMAs,  either  due  to  cost 
restrictions  or  because  they  represent  non-traditional  topographic  data,  could 
be  of  value  to  citizens  and  to  various  public  services  and  government  agencies. 

To  engage  with  the  VGI  community,  the  main  recommendation  for  an  NMA 
is  to  build  a  VGI  platform  that  allows  users  to  make  reports  but  also  to  collect 
new,  additional  features  that  are  not  traditionally  collected,  to  create  a  citizen 
and  partner  layer.  An  increasing  number  of  VGI  projects  to  collect  data  have 
been  proposed  during  the  last  decade.  As  noted  in  the  review  by  See  et  al. 
(2016),  there  is  considerable  variability  in  both  the  sustainability  and  the  goal 
of  the  VGI  projects.  Some  of  them  have  been  successfully  operating  for  a  long 
time  while  others  have  a  finite  life,  being  linked  to  some  specific  events,  or  are 
no  longer  active  or  available  online.  Moreover,  few  governments  and  munici¬ 
palities  have  proposed  platforms  to  collect  data  from  citizens  for  purposes  such 
as  urban  planning.  Other  public  services,  such  as  medical  emergency  depart¬ 
ments  or  hre  services,  use  their  own  resources  to  collect  specific  spatial  data 
(e.g.  water  pumps,  obstacles,  building  entrances),  which  need  to  be  matched  to 
spatial  reference  data. 

Being  aware  of  these  current  practices  and  initiatives,  the  question  of  why 
an  NMA  should  also  propose  a  VGI  platform  is  a  relevant  one.  We  believe  that 
NMAs,  as  public  bodies,  on  the  one  hand  are  officially  responsible  for  provid¬ 
ing  accurate  and  reliable  information  through  SDIs  to  all  potential  users  and, 
on  the  other  hand,  have  the  necessary  expertise  to  manage  and  integrate  spa¬ 
tial  data.  Moreover,  all  of  the  public  initiatives  mentioned  earlier  could  not  be 
implemented  without  important  financial  and  human  resources  for  deploying 
the  GIS  systems  to  collect,  manage  and  maintain  data  and  to  train  agents  to 
deal  with  spatial  information.  We  believe  that  a  stronger  collaboration  between 
NMAs  and  governments  through  a  VGI  platform  could  result  in  a  public-cost 
reduction  and  a  better  service  to  citizens,  where  these  could  be  more  involved 
in  decision-making  or  in  supporting  security  issues  that  affect  their  lives  in  a 
positive  or  negative  way.  Thus,  for  a  successful  VGI  platform,  one  of  the  most 
important  recommendations  is  to  engage  with  citizens  in  general,  specific 
groups  of  citizens  having  the  same  interests,  and  groups  of  public  and  gov¬ 
ernmental  bodies,  including  the  educational  system.  Engaging  with  different 
public  bodies  and  with  the  educational  sector  will  increase  citizen  involvement 
since  these  bodies  are  close  to  citizens  and  may  invest  in  the  future  by  educat¬ 
ing  and  raising  the  awareness  of  younger  generations  regarding  the  relevance 
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of  spatial  data  and  their  quality.  These  engagements  could  create  motivation, 
increase  sustainability  and  promote  good-quality  data  for  both  NMAs  and  the 
contributors. 

Another  important  aspect,  more  oriented  towards  citizens,  that  can  increase 
motivation  is  gamification.  However,  when  implementing  reward  systems  (of 
gamified  or  real-life  rewards),  attention  shouldbe  paid  to  the  fact  that  data  qual¬ 
ity  is  much  more  important  than  quantity,  and  this  should  be  clearly  explained 
to  the  contributor.  Thus,  a  good  practice  for  gamification  is  to  avoid  giving 
rewards  or  prizes  based  on  (or  only  on)  the  number  of  contributions  made. 

We  feel  that  a  platform  based  on  the  recommendations  discussed  in  this 
chapter  is  feasible  and  can  be  carried  out  in  a  step-by-step  manner  through 
the  development  of  pilots  and  research  projects,  as  exemplified  by  the  ongoing 
initiative  of  the  Finnish  mapping  agency,  which  is  defining  and  preparing  to 
test  the  concept  of  a  citizen  layer. 

Due  to  the  importance  of  and  increasing  trend  in  VGI,  we  believe  that  NMAs 
should  develop  national  VGI  platforms  for  both  data  collection  and  data  dis¬ 
semination,  even  if  it  is  difficult  to  predict  if,  or  when,  these  initiatives  will 
really  become  a  ‘standard  practice’  for  all  NMAs. 
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Abstract 

This  chapter  highlights  two  types  of  georeferenced  User-Generated  Con¬ 
tent  (geo-UGC)  that  show  considerable  potential  for  fruitful  usage  in  spatial 
planning  in  practice:  Volunteered  Geographic  Information  (VGI)  and  Social 
Media  Geographic  Information  (SMGI).  By  describing  selected  case  studies, 
the  chapter  illustrates  how  geo-UGC  can  be  used  at  different  stages  of  spatial 
planning  processes,  supporting  a  more  pluralist  understanding  of  places,  fos¬ 
tering  the  collaboration  between  decision-makers  and  contributing  to  a  more 
participatory  practice  in  spatial  planning.  The  Geodesign  approach  is  used  as 
the  framework  for  underpinning  the  discussion.  Selected  case  studies  devel¬ 
oped  by  the  authors  are  presented  showing  how  geo-UGC  can  be  beneficial  for 
building  knowledge  on  current  urban  and  territorial  dynamics,  for  identify¬ 
ing  possible  alternative  futures  and  for  finding  agreement  on  preferable  future 
developments.  In  all  the  selected  cases,  large  numbers  of  users  were  involved 
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in  collecting  volunteered  content.  The  findings  are  also  interpreted  within  the 
Smart  Cities  paradigm,  where  participation  is  an  essential  factor  for  building 
successful  smart  communities. 


Keywords 
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1  Introduction 

Spatial  planning,  as  an  interdisciplinary  practice  of  managing  the  development 
of  space  in  its  physical,  functional  and  socio-economic  dimensions,  aims  to 
provide  efficient,  economically  viable,  just  and  sustainable  space  arrangements. 
It  is  traditionally  a  competence  of  a  state,  regional  or  local  authority,  and  usu¬ 
ally  involves  a  number  of  actors  and  institutions. 

In  the  last  few  decades  a  stronger  emphasis  has  been  placed  on  the  involve¬ 
ment  of  the  community  and  the  users  of  space  in  urban  planning  procedures. 
In  part  this  has  arisen  from  the  general  democratisation  of  the  processes  in  con¬ 
temporary  societies  in  many  Western  countries,  but  it  has  also  emerged  out  of 
a  need  to  avoid  conflicts  between  opposing  parties,  which  often  have  contrary 
interests  in  space  (Arnstein,  1969;  European  Commission,  2003;  McTague  and 
Jakubowski,  2013;  Cerar,  2014). 

Prior  to  the  widespread  diffusion  of  new  Information  and  Communication 
Technologies  (ICT),  public  participation  was  largely  understood  as  a  form  of 
public  commenting  on  already  prepared  plans,  while  emerging  technologies 
have  opened  up  new  and  innovative  ways  of  realising  the  active  involvement  of 
the  wider  public  in  spatial  planning  (Bizjak,  2012).  Opportunities  have  arisen 
in  different  fields,  e.g.  improving  the  communication  between  authorities  and 
citizens,  providing  more  accurate  and  up-to-date  databases  on  the  current  state 
of  territorial  conditions,  and  collecting  the  ideas  and  visons  for  future  develop¬ 
ments  of  different  stakeholders  (Berntzen  et  al.,  2005;  Brabham,  2009;  Seltzer 
and  Mahmoudi,  2013). 

As  a  dynamic  and  complex  socio-technical  process,  spatial  planning  may 
entail  multi-faceted  paradigms  originating  in  a  variety  of  workflows  in  prac¬ 
tice.  The  aim  of  this  chapter  is  to  use  the  concept  of  Geodesign  (Steinitz,  2012), 
which  is  one  of  many  possible  ways  of  approaching  spatial  planning,  to  explore 
the  opportunities  for  exploiting  georeferenced  User-Generated  Content  (geo- 
UGC)  in  spatial  planning.  We  can  differentiate  between  two  main  categories 
of  geo-UGC  of  particular  interest  in  spatial  planning,  either  as  an  information 
resource  or  as  a  communication  platform,  or  both:  Volunteered  Geographic 
Information  (VGI),  which  is  geo-UGC  purposely  collected  by  a  group  of  users 
for  a  given  purpose  (e.g.  OpenStreetMap.com);  and  Social  Media  Geographic 
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Information  (SMGI),  which  is  geo-UGC  collected  passively  (e.g.  Twitter.com; 
instagram.com)  or  actively  (e.g.  fixmystreet.org;  projectnoah.org;  carticipe.net) 
on  social  networking  platforms.  In  the  next  section,  the  Geodesign  approach 
is  outlined,  along  with  the  opportunities  for  effective  use  of  geo-UGC.  This  is 
followed  by  a  set  of  case  studies  from  the  authors,  which  illustrate  how  geo- 
UGC  has  been  used  in  planning,  relating  these  examples  to  different  stages  in 
the  Geodesign  approach.  Finally,  we  consider  how  VGI  and  SMGI  can  support 
‘smart  cities’  initiatives. 


2  The  Geodesign  Approach:  Opportunities  Arising  from 
VGI  and  SMGI 

In  the  last  decade,  the  term  Geodesign  has  gained  popularity  among  a  grow¬ 
ing  number  of  spatial  planners,  landscape  architects  and  Geographic  Informa¬ 
tion  Systems  (GIS)  scholars,  formalising  an  innovative  approach  to  planning 
and  design  deeply  rooted  in  geographic  analysis  and  at  the  same  time  able  to 
foster  collaboration  in  decision-making.  Geodesign  maybe  defined  as  an  inte¬ 
grated  process,  informed  by  environmental  sustainability  appraisal,  that  aims 
to  address  complex  problems  related  to  territorial  and  environmental  issues 
and  to  social  and  economic  matters  (Dangermond,  2010).  The  main  novelty 
in  the  Geodesign  approach  is  the  extensive  use  of  digital  spatial  data  and  pro¬ 
cessing  and  of  communication  resources  such  as  ICT  and  GIS,  aimed  at  eas¬ 
ing  the  integration  of  societal  and  scientific  knowledge  in  planning,  design 
and  decision-making  (Ervin,  2011).  Current  technologies  maybe  considered 
mature  enough  to  exploit  ICT  support  in  spatial  planning  processes,  overturn¬ 
ing  the  barriers  that  in  the  past  limited  the  use  of  new  technologies  in  prac¬ 
tice  (Go^men  and  Ventura,  2010).  Additionally,  ICT,  the  Internet  and,  more 
recently,  Web  2.0  technologies  are  increasingly  channeling  digital  Geographic 
Information  (GI)  into  the  daily  lives  of  a  growing  number  of  users.  This  phe¬ 
nomenon  is  leading  to  a  paradigmatic  shift  in  the  contents  and  characteristics 
of  GI,  as  well  as  in  its  modes  of  production  and  dissemination  (Elwood  et  al., 
2012).  In  the  spatial  planning  domain,  this  unprecedented  wealth  of  digital  GI 
provides  great  opportunities  for  advances  in  methodologies  such  as  Geode¬ 
sign,  fostering  opportunities  for  supporting  design,  analysis  and  decision¬ 
making  processes.  Most  of  the  opportunities  arising  for  innovation  emerge 
from  the  avalanche  of  spatial  big  data,  which  Web  2.0  technologies  are  making 
available  to  the  wider  public. 

In  the  last  two  decades,  developments  in  Spatial  Data  Infrastructures  (SDIs) 
have  enabled  access  to  digital  GI  produced  and  maintained  by  public  or  private 
institutions  for  public  or  business  purposes.  In  Europe,  the  implementation  of 
Directive  2007/02/CE,  establishing  a  shared  Infrastructure  for  Spatial  Informa¬ 
tion  in  Europe  (INSPIRE),  fostered  the  development  of  National  and  Regional 
SDIs  in  the  Member  States,  allowing  the  public  access  and  reuse  of  available 
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official  information,  or  Authoritative  Geographic  Information  (A-GI),  accord¬ 
ing  to  common  data,  technology  and  policy  standards.  Secondly,  several  plat¬ 
forms,  continuously  flourishing  through  the  Internet  as  a  result  of  Web  2.0 
technologies,  are  supporting  the  production  and  diffusion  of  User- Generated 
Content  (UGC),  which  often  has  a  geographic  reference  embedded,  potentially 
transforming  the  Web  into  a  big  warehouse  of  spatial  data  (Elwood  et  al.,  2012). 
Spatial  UGC  is  commonly  labelled  as  VGI,  emphasising  the  voluntary  activi¬ 
ties  of  users  to  collect  and  contribute  information  related  to  the  geographic 
world  (Goodchild,  2007).  In  spatial  planning,  VGI  may  supply  both  experi¬ 
ential  knowledge  from  local  communities  and  expert  knowledge  from  profes¬ 
sionals  in  a  bottom-up  approach,  e.g.  through  citizen  science  initiatives.  SMGI, 
which  is  a  subset  of  UGC  (Campagna,  2014),  is  spatial  information  produced 
and  shared  through  social  network  sites,  and  may  allow  for  the  collection  of 
quantitative  GI  related  to  a  study  area  but  also  of  qualitative  information  con¬ 
cerning  the  perceptions  of  users  about  phenomena  in  space  and  time.  Indeed, 
SMGI  is  different  from  traditional  common  vector  spatial  datasets  such  as  A-GI 
supplied  by  institutional  SDIs,  which  exclusively  feature  spatial  and  thematic 
information:  the  SMGI  data  model  features  spatial,  temporal  and  multimedia 
dimensions  (i.e.  image,  text,  video  and  audio),  as  well  as  a  user  dimension, 
including  specific  information  about  the  user  profiles.  Furthermore,  in  certain 
cases,  the  SMGI  data  model  also  includes  a  preference  dimension,  i.e.  SMGI 
appreciation  expressed  by  the  social  network  community  by  means  of  scores, 
stars  or  likes/dislikes,  thus  widely  expanding  the  range  of  analytical  opportuni¬ 
ties  for  planners  and  analysts  (Campagna  et  al.,  2015).  A  comparison  between 
the  SMGI  and  traditional  A-GI  data  models  is  shown  in  Figure  1. 

The  general  SMGI  data  model  may  foster  advances  in  spatial  planning  meth¬ 
odologies  and  may  be  a  valuable  complement  to  traditional  A-GI  that  can 
support  several  stages  of  the  Geodesign  process.  To  formalise  the  Geodesign 
approach,  Steinitz  (2012)  proposed  a  methodological  framework  that  relies 
on  six  models:  representation,  process,  evaluation,  change,  impact  and  deci¬ 
sion  models.  These  are  iteratively  implemented  to  design  future  development 
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Fig.  1:  Comparison  between  the  A-GI  data  model  and  the  SMGI  data  model 
(Adapted  from  Campagna,  2016). 
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alternatives  and  to  identify  their  potential  consequences  by  means  of  a  territo¬ 
rial  context  description,  an  analysis  of  the  dynamics  and  an  evaluation  of  the 
impacts.  The  first  three  models  describe  the  present  situation  of  the  territorial 
context  considering  (1)  the  environmental  system,  and  (2)  explaining  its  evolu¬ 
tion,  mainly  focusing  on  (3)  opportunities  and  threats  that  may  arise  from  the 
current  situation.  Conversely,  the  last  three  models  define  potential  alternatives 
for  (4)  transforming  the  system,  (5)  assessing  the  transformation  alternatives’ 
potential  beneficial  or  dangerous  impacts  on  environmental  and  human  sys¬ 
tems,  and  eventually  (6)  supporting  stakeholders  during  the  decision-making 
process. 

VGI  and  SMGI  may  thus  be  used  to  complement  the  availability  of  official 
information  for  the  implementation  of  all  the  Geodesign  models,  supplying 
useful  societal  data.  In  the  representation  model,  SMGI  may  be  used  to  facili¬ 
tate  the  description  of  a  geographic  context,  providing  experiential  knowledge 
that  is  usually  dismissed  in  official  information  and  integrating  A-GI  with  a 
pluralist  vision  of  geographic  phenomena,  which  maybe  used  to  identify  social 
and  cultural  dynamics  affecting  the  area.  For  example,  SMGI  from  several 
Location-Based  Social  Networks  (LBSNs)  has  been  used  to  identify  the  most 
appreciated  Points  of  Interest  (POIs)  and  landmarks  in  a  study  area  (Jankowski 
et  al.,  2010),  the  pedestrian  paths  in  the  historical  centre  of  a  city,  the  neigh¬ 
bourhoods  featuring  the  lowest  number  of  services  and  the  different  land  uses 
in  an  urban  environment  (Frias-Martinez  et  ah,  2012),  and  to  classify  urban 
areas  (Noulas  et  al.,  2011). 

Regarding  the  development  of  process  models,  SMGI  may  be  used  to  inves¬ 
tigate  how  detected  phenomena  evolve  over  time  thanks  to  the  real-time  sup¬ 
ply  of  information,  which  may  be  used  for  monitoring  and  to  feed  predictive 
models  for  studying  future  trends  and  dynamics.  SMGI  may  also  be  extracted 
and  analysed  for  different  periods  from  different  social  networks,  investigating 
first  whether  current  phenomena  were  already  present  in  the  past  and  secondly 
if  the  potential  factors  affecting  these  phenomena  persist,  in  order  to  evaluate 
the  future  situation.  Similarly,  users’  preferences  about  urban  mobility  or  cul¬ 
tural  dynamics  may  be  elicited  from  SMGI  with  the  aim  of  feeding  agent-based 
models  that  can  simulate  individual  behaviours. 

In  the  evaluation  model,  SMGI  may  be  used  to  assess  the  current  situation  of 
the  geographic  area,  due  to  the  preferences,  opinions  and  behaviours  of  users, 
which  are  embedded  in  this  source  of  information.  For  instance,  SMGI  may  be 
extracted  for  studying  the  movements  of  users  in  urban  environments  (Jankowski 
et  al.,  2010),  the  utilisation  rates  of  public  spaces  (Torres  and  Costa,  2014)  and 
the  neighbourhood  perceptions  of  users  (Massa  and  Campagna,  2014),  as  well  as 
the  dynamics  of  different  population  groups  (Longley  et  al,  2015). 

Furthermore,  social  networks,  representing  a  means  to  gain  useful  insights 
about  the  social  and  cultural  dynamics  of  an  area,  may  support  the  develop¬ 
ment  of  alternative  scenarios  in  the  Geodesign  change  model,  and,  at  the  same 
time,  they  may  be  used  to  actively  involve  local  communities  during  planning 
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and  design  (Eraranta  et  at,  2015).  In  addition,  SMGI  may  be  useful  in  the 
Geodesign  impact  model  to  assess  the  potential  alternative  effects  on  the  terri¬ 
tory,  due  to  the  possibility  to  present  change  scenarios  to  the  local  community 
and  to  collect  feedback  using  a  participatory  planning  approach  (Rantanen  and 
Kahila,  2009). 

Finally,  despite  the  difficulties  in  transposing  the  experiential  knowledge  of 
local  communities  into  practice  (Nonaka  and  Takeuchi,  1995),  SMGI  might 
be  used  to  foster  a  communicative  process  among  participants  in  the  decision 
model,  wherein  the  mutual  integration  of  expert  and  experiential  knowledge 
is  a  crucial  step  (Khakee  et  al.,  2000)  to  build  a  shared,  sustainable  and  demo¬ 
cratic  development  process  for  the  territory.  Commonly,  a  local  community’s 
experiential  knowledge  is  considered  exclusively  an  opinion  in  planning  pro¬ 
cesses  (Fischer,  2000);  however,  the  technical  knowledge  of  experts  may  not 
be  sufficient  to  properly  guide  decision-making  processes  (Findblom,  1990). 
Hence,  the  integration  of  A-GI  and  SMGI  may  support  the  decision  model, 
and  may  foster  the  development  of  more  transparent,  pluralist  and  democratic 
decision-making. 

In  the  next  section,  selected  case  studies  that  we  carried  out  will  be  briefly 
outlined  to  demonstrate  the  value  of  SMGI  at  different  stages  of  the  planning 
process,  using  the  Geodesign  framework  as  a  reference. 


3  Case  Studies  on  the  Value  of  VGI  and  SMGI  in  Spatial 
Planning  and  Design 

3.1  Representation  Model 

Representation  of  geographic  information  is  extremely  important  for  planners 
and  citizens.  Both  of  them  use  visualisation  methods  to  explore  the  real  world 
and  as  a  basis  for  analysing  different  scenarios  based  on  spatial  data.  Visuali¬ 
sation  is  one  of  the  possible  representations  for  VGI,  and  probably  the  most 
powerful  one.  Geovisualisation  explores  geospatial  information  and  supports 
decision-making  processes  in  spatial  planning. 

One  innovative  example  of  representation  is  the  interactive  visualisation  of 
OpenStreetMap  (OSM),  which  allows  users  to  upload  quantitative  and  quali¬ 
tative  data  in  a  Web-based  GIS,  as  was  the  case  in  the  GeoCampPACA  event. 
GeoCampPACA2016  was  a  mapping  party  organised  by  OSM  France,  the 
Provence-Alpes-Cote  d’Azur  (PACA)  French  region  and  the  region’s  centre  for 
geoinformation,  CRIGE  (Figure  2).  The  aim  of  this  event  was  to  make  a  survey 
related  to  different  modes  of  transport,  such  as  pedestrian,  bicycle,  car,  bus, 
tram  and  train  routes,  including  infrastructure,  equipment,  services,  etc.,  and 
to  represent  the  information  in  cartographic  form.  This  two-day  event  was  a 
real  participatory  mapping  operation,  open  to  all  students  in  geography  and 
GIS  of  the  PACA  French  region.  The  first  day  was  dedicated  to  OSM  protocols 
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Fig.  2:  Images  from  the  GeoCampPACA2016  web  mapping  party  organised  by  OpenStreetMap  in  France.  ©OpenStreetMap 
contributors. 
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and  basic  notions  of  crowdsourcing  and  GIS,  while  the  second  day  was  devoted 
to  practical  and  field  activities  in  the  different  main  train  stations  of  the  region. 
The  event  facilitated  the  creation  of  open  data  available  on  the  OSM  portal, 
while  allowing  participants  to  gain  a  better  understanding  of  their  surrounding 
environment. 


3.2  Process  Model 

As  mentioned  earlier,  the  Geodesign  process  model  concerns  the  understand¬ 
ing  of  current  territorial  dynamics.  This  model  will  be  illustrated  with  two 
examples.  The  first  is  a  case  study  of  volunteered  urban  cycling  information  via 
GPS  devices,  which  demonstrates  how  VGI  can  help  planners  monitor  current 
behaviour  and  preferences  in  movement  and  transport  dynamics.  The  second 
case  study  shows  how  the  daily  spatial  practices  of  homeless  people  can  be  bet¬ 
ter  comprehended  through  the  use  of  VGI. 

Rising  motorisation  rates  in  Europe  and  related  environmental  issues  have 
created  a  demand  for  new  urban  planning  and  design  paradigms  in  relation  to 
urban  transportation  (Eurostat,  2012;  Knoflacher,  2007;  Zubelzu  and  Fernan¬ 
dez,  2016).  The  new  spatial  planning  paradigms  are  advocating  for  a  change 
in  the  proportion  of  means  of  mobility  in  favour  of  non-motorised  and  public 
transportation  to  account  for  personal  motorised  traffic.  Within  these  endeav¬ 
ors,  urban  cycling  is  gaining  momentum,  and  new  strategies  have  been  devel¬ 
oped  to  accommodate  urban  cycling  into  existing  cities. 

One  of  the  related  urban  planning  issues  is  the  improvement  of  the  existing 
and  provision  of  new  cycling  infrastructures.  Contemporary  smart  approaches, 
however,  do  not  deal  with  the  infrastructure  as  a  physical  element,  but  deal 
with  it  solely  in  relation  to  perceptual  and  behavioural  patterns,  i.e.  how  peo¬ 
ple  tend  to  perceive  and  use  it;  the  main  aim  is  to  provide  infrastructure  that 
will  be  efficient  and  safe  and  to  encourage  enough  people  to  use  it  regularly.  A 
wide  range  of  approaches  have  been  developed  to  help  understand  what  kind 
of  cycling  infrastructure  is  preferred  and  demanded  by  users  in  contemporary 
cities,  and  VGI  is  playing  an  increasingly  important  role  in  these  developments 
(Latham  and  Wood,  2015;  Yeboah  and  Alvanides,  2015;  Winters  et  al.,  2016). 

Such  an  attempt  has  been  made  with  CyCity,  a  research  programme  by  the 
Swedish  governmental  agency  Vinnova,  with  the  aim  to  improve  the  knowl¬ 
edge  on  urban  cyclists’  preferences  in  route  choices  (Envall  and  Koucky,  2013). 
Through  a  combined  technique  of  using  GPS  devices  and  online  question¬ 
naires,  each  participating  urban  cycler  has  provided  valuable  information  for 
the  planning  and  (re)design  of  cycling  path  networks  in  the  cities  of  imple¬ 
mentation  (Ljubljana  in  Slovenia  and  Linkoping  in  Sweden).  For  a  limited 
time,  participants  were  given  user-friendly  GPS  devices  and  asked  to  record 
every  biking  route  they  made  in  the  city,  as  well  as  filling  out  a  questionnaire 
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regarding  qualitative  data  on  the  cycling  routes  (Tominc  et  al.,  2012).  Even 
though  the  GPS  technology  proved  to  be  not  very  precise  and  accurate  (e.g. 
the  mapped  polylines  overlapped  with  built  blocks,  etc.),  the  research  revealed 
a  big  potential  to  fulfil  the  needs  of  urban  planning  (Figure  3),  namely  in  the 
following  aspects: 

•  The  appropriate  amount  of  mapped  cycling  tracks  clearly  indicates  where 
in  the  city  the  cycling  trips  densify,  as  well  as  where  they  are  non-existent. 
The  densely  cycled  areas  maybe  regarded  as  potential  locations  to  place  and 
develop  programmes  that  appeal  to  cyclists,  which  may  generate  new  urban 
activities,  much  longed  for  in  urban  regeneration  processes. 

•  Areas  that  have  no  records  of  tracks  at  all  should  be  observed  in  detail  to 
determine  the  reasons  why  and  the  possible  solutions  for  increasing  cycling 
opportunities. 

•  The  cross-interpretation  of  GPS  tracks  and  qualitative  data  offers  an  exclu¬ 
sive  insight  into  how  different  sections  of  the  cycling  network  are  perceived 
by  users  and  what  their  preferences  are  when  choosing  their  cycling  routes. 

Urban  transportation,  as  one  of  the  most  dynamic  and  changeable  features  of 
urban  settlements,  is  certainly  a  planning  sector  that  can  greatly  benefit  from  the 
usage  of  VGI,  where  urban  cycling  is  just  one  example.  As  the  main  mission  of 
urban  settlements  is  to  provide  settings  for  human  interactions  and  exchanges, 
it  is  important  to  reveal  peoples  perceptions,  expectations  and  desires  in  vari¬ 
ous  fields  of  urban  life.  In  this  respect,  the  CyCity  initiative  showed  that  VGI 
can  provide  a  valuable  source  of  direct  information. 

Another  example  of  how  VGI  has  been  used  to  shed  light  on  the  spatial  prac¬ 
tice  of  local  communities  is  one  launched  in  2014  in  Denmark.  In  the  city  of 
Odense,  a  project  was  initiated  whereby  the  homeless  population  in  the  city 
was  invited  to  participate  in  monitoring  their  daily  spatial  practices  using  port¬ 
able  GPS  technology.  Homeless  people  and  other  vulnerable  groups  are  under¬ 
represented  in  the  planning  and  political  apparatus  of  the  modern  city,  so  the 
physical  planning  of  the  city  is  not  influenced  by  these  groups,  despite  the  fact 
that  group  members  are  often  very  present  in  the  city,  and  often  with  no  place 
else  to  turn  to  than  the  streets. 

Much  of  the  research  to  date  has  investigated  homelessness  and  homeless 
mobility  in  the  city  (e.g.  Wolch  et  al.,  1993;  Cloke  et  al.,  2008),  as  well  as  in 
the  countryside  (Cloke  et  al.,  2003).  The  spatial  practice  of  homeless  people 
has  also  been  the  topic  of  numerous  studies.  Some  studies  have  focused  on 
homelessness  among  immigrant  groups  in  Europe  (e.g.  Pezzoni,  2011)  while 
others  have  focused  on  gender  issues  (e.g.  Crystal,  1984)  involved  in  homeless¬ 
ness.  However,  only  very  few  studies,  if  any,  can  be  identified  that  utilise  con¬ 
temporary  location  technology  in  relation  to  monitoring  the  spatial  practice  of 
homeless  groups. 
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Fig.  3:  An  example  of  analyses  of  the  cycled  tracks  as  recorded  during  the  CyCity  research  in  Ljubljana. 
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In  the  Odense  project,  data  are  collected  twice  a  year.  A  number  of  GPS 
devices  are  left  in  one  of  the  shelters  operated  by  the  Blue  Cross  NGO  in  col¬ 
laboration  with  the  municipality.  The  homeless  people  are  encouraged  to  put 
a  GPS  device  in  their  pockets  and  to  hand  the  GPS  back  the  next  day.  It  is,  to 
some  extent,  a  leap  of  faith  for  the  homeless  to  participate  in  such  an  enter¬ 
prise,  as  many  doubts  and  fears  about  the  use  of  the  data  can  be  raised;  here, 
the  close  collaboration  with  officials  from  the  municipality  and  high  ethical 
standards  (F.  Harvey,  2013)  are  paramount,  as  the  data  contributors  have  to  be 
assured  that  data  on  their  spatial  patterns  are  not  revealed  to  any  third  party. 
After  one  day  of  carrying,  the  GPS  units  are  collected  and  the  data  are  gathered 
and  analysed. 

To  date,  the  project  has  implemented  three  data  collection  routines,  and 
already  the  results  are  being  used  by  officials  in  the  municipality  as  part  of  the 
planning  process.  Data  on  mobility  patterns  have  revealed  new  bottlenecks  in 
the  spatial  practices  of  the  homeless;  confluences  of  mobility  have  been  identi¬ 
fied,  and  places  for  resting  and  meeting  up  have  been  confirmed  or  investi¬ 
gated  as  part  of  the  data  analysis.  The  results  from  these  analyses  and  the  new 
insights  into  homeless  mobility  are  further  being  used  in  the  physical  planning 
of  the  city  of  Odense  in  order  to  identify  places  to  erect  new  structures  such  as 
shelters  and  roofed  open  spaces  for  the  homeless  and  other  vulnerable  groups. 
The  results  are  also  being  considered  whenever  new  projects  are  initiated  in 
the  city. 

As  such,  the  Odense  project  highlights  the  fact  that  locational  data  on  vulner¬ 
able  groups  can  be  collected  in  a  volunteered  data  collection  regime  and  can  be 
used  very  effectively  as  a  means  to  give  voice  to  a  group  of  citizens  that  does  not 
traditionally  get  heard  in  the  physical  planning  of  the  city.  This  type  of  informa¬ 
tion,  and  empowerment,  would  not  be  possible  without  data  being  provided  by 
contemporary  techniques;  users  volunteering  the  data;  and  ethical  procedures 
and  analysis  protocols  to  structure  the  understanding  and  use  of  the  results  in 
a  manner  that,  on  the  one  hand,  meets  the  requirements  of  the  planning  organs 
of  the  municipality  while,  on  the  other  hand,  makes  sense  to  the  vulnerable 
groups  volunteering  the  data. 


3.3  Evaluation  Model 

Another  example  of  the  considerable  value  of  VGI  for  urban  planning  is  in 
the  field  of  the  (re)  design  and  (re)establishment  of  the  quality  of  open  urban 
public  spaces.  Open  public  spaces  are  the  most  contested  spaces  of  contempo¬ 
rary  cities,  as  they  are  common  spaces  and  different  users  and  interest  groups 
have  different  conceptions  and  aspirations  related  to  them.  At  the  same  time 
they  are  the  places  that  connect  the  urban  population  in  real  space  and  time 
and  play  a  crucial  role  in  the  socio-economic  dynamics  of  cities  (Madanipour 
et  al.,  2014). 
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In  order  to  reveal  peoples  spatial  perceptions  on  urban  public  spaces,  various 
techniques  have  been  developed,  from  traditional  mental  mapping  techniques 
inspired  by  Lynch  (1960)s  work  to  a  variety  of  contemporary  IT-supported 
community  techniques  (Davis,  2007;  Evans-Cowley,  2010;  Bizjak,  2012). 

The  perceptual  dimension  of  space,  namely  emotions  related  to  concrete  spa¬ 
tial  arrangements,  proves  to  be  rather  difficult  to  grasp  in  a  form  that  could 
effectively  support  the  processes  of  spatial  planning;  it  is  personally  condi¬ 
tioned  and  varies  greatly  among  individuals.  Nevertheless,  as  technically  sup¬ 
ported  VGI  allows  large  samples  to  be  collected,  this  aspect  of  urban  planning 
may  well  find  a  way  onto  urban-planning  agendas  of  the  future,  if  the  commu¬ 
nication  tools  are  adjusted  to  the  knowledge  and  skills  of  the  general  public. 
A  concrete  example  is  the  project  outlined  in  Healey  and  Ramaswamy  (2016), 
which  explores  possibilities  to  estimate  and  visualise  sentiments  through  text 
mining  methods,  starting  from  short,  incomplete  text  snippets  on  Twitter.  Col¬ 
lections  of  real-time  tweets  are  visualised  in  various  ways:  by  sentiments,  by 
topic,  by  location,  by  frequent  terms  and  their  co-occurrence,  etc.  Another 
very  appropriate  medium  to  reveal  one’s  perception  of  space  is  photography 
and  the  descriptions  attached  to  photographs.  An  example  that  has  revealed 
the  attitudes  and  perceptions  of  inhabitants  regarding  their  immediate  living 
environment  through  photography  is  the  Human  Cities  (2016)  online  project 
(Figure  4).  One  of  its  many  activities  is  a  participatory  collection  of  urban 
neighbourhood  photographs.  The  project  is  based  on  a  conviction  that  it  is 
important  to  reveal  the  shared  values  that  local  inhabitants  have  to  propose 
sensible  urban  design  improvements  to  neighbourhoods.  The  Human  Cit¬ 
ies  (2016)  online  photograph  contest  runs  as  a  web-blog  as  well  as  a  mobile 
phone  app  and  has  been  organised  with  pre-defined  thematic  categories,  e.g. 
Most  pleasant  place  in  my  neighbourhood;  Professions  in  my  neighbourhood; 
My  neighbour;  Borders  of  my  neighbourhood;  Shared  values  in  my  neigh¬ 
bourhood.  By  analysing  the  photographs  in  each  category  and  their  subtitles, 
planners  are  given  a  deeper  insight  into  the  otherwise  hidden  layer  of  local 
environments,  i.e.  the  interpretations  of  local  places  by  users,  which  would 
not  traditionally  be  taken  into  consideration  in  urban  (re)  design  processes  or 
would  have  to  be  undertaken  through  time-consuming  interviewing. 


3.4  Change,  Impact  and  Decision  Models 

According  to  Simon  (1969),  any  design  process  entails  devising  courses  of  action 
aimed  at  changing  existing  situations  into  preferred  ones.  In  order  to  achieve  a 
design,  Simon  (1969)  proposes  a  three-tier  iterative  workflow  of  intelligence  (i.e. 
the  knowledge  base  is  created),  design  (i.e.  the  alternative  possible  future  courses 
of  action  are  devised)  and  choice  (where  the  preferable  option  is  selected  for 
implementation).  These  definitions  and  this  approach  can  be  considered  appli¬ 
cable  to  the  majority  of  spatial  planning  (and  Geodesign)  processes. 
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While  previous  case  studies  gave  evidence  of  how  VGI  and  SMGI  can  be 
used  as  information  resources  in  the  intelligence  phase  (i.e.  the  representa¬ 
tion,  process,  and  evaluation  models  in  Geodesign),  the  following  example 
shows  how  a  Web-based  collaborative  platform  with  social  networking  fea¬ 
tures  can  be  used  to  involve  a  large  number  of  users  in  collecting  volunteered 
content  about  design  and  choice  (i.e.  the  change,  impact  and  decision  models 
in  Geodesign). 

While  social  media  have  been  acknowledged  as  a  potentially  powerful 
means  for  engineering  design  and  communication  (Gopsill  et  al.,  2013)  and 
for  supporting  design  studio  work  (Giiler,  2015),  until  recently  there  have  not 
been  many  Web-based  platforms  that  were  available  to  support  collaborative 
planning  and  design.  One  example  of  such  a  platform  is  the  geodesignhub. 
com  platform  developed  by  Ballal  and  Steinitz  (2015),  which  implements 
the  Steinitz  Geodesign  Framework  (Steinitz,  2012).  This  platform,  which 
has  been  successfully  applied  in  a  growing  number  of  Geodesign  workshops 
(Rivero  et  al.,  2015;  Nyerges  et  al.,  2016;  Campagna  et  al.,  2016),  allows  for 
crowdsourcing  of  spatial  data  diagrams  (i.e.  georeferenced  lines  and  poly¬ 
gons)  representing  design  options  (i.e.  projects  and  policies)  by  a  number  of 
users  (usually,  but  not  necessarily,  around  30).  After  the  project  and  policy 
diagrams  are  collected  (see  Figure  5  for  examples),  the  users  can  combine 
them  in  complex  design  syntheses  that  can  be  compared  and  evaluated 
against  an  impact  model  highlighting  positive  and  negative  impacts  as  well 
as  costs  (Figure  6).  The  platform  also  features  a  number  of  tools  supporting 
negotiations  so  that  the  users  participating  in  a  workshop  (which  can  be  vir¬ 
tual  and  of  same/different  place/time  types)  can  eventually  find  consensus  on 
a  common  shared  design. 

The  data  stored  in  the  project  geodatabase  of  geoidesignhub.com  can  be  con¬ 
sidered  as  a  design  stemming  from  VGI.  In  addition,  the  data  feature  SMGI 
characteristics  for  design  diagrams,  i.e.  they  have  spatial,  temporal,  user  and 
preference  dimensions,  which  can  be  further  used  to  analyse  the  overall  design 
process  and  participant  behaviours.  This  demonstrates  a  novel  approach  in 
making  value  of  crowdsourced  design  contents  in  spatial  planning  and  (geo) 
design  processes. 


4  VGI  and  SMGI  to  Support  Smart  Cities  Initiatives 

The  examples  in  the  previous  section  aimed  to  support  the  idea  that  the 
increasing  wealth  of  digital  GI,  made  freely  available  through  the  Internet  to 
analysts,  planners  and  practitioners,  may  affect  the  current  practices  in  spatial 
planning.  While  this  process  may  still  be  at  an  early  stage,  it  is  likely  that  it  may 
foster  the  development  of  ‘smart  city’  strategies  in  the  future.  These  strategies 
rely  not  only  on  the  development  of  intelligent  technologies  but  also  on  smart 
governance  models  according  to  which  strategic  and  management  decisions 
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Fig.  5:  Project  and  policy  diagrams  of  the  Cagliari  (Italy)  metro  area  crowd- 
sourced  at  a  Geodesign  workshop  in  2016  with  geodesignhub.com.  Each  dia¬ 
gram  in  the  matrix  represents  a  project  or  a  policy  proposed  by  the  partici¬ 
pants  during  the  crowdsourcing  design  exercise. 


are  informed  by  the  real  concerns  and  preferences  of  local  communities  as  a 
result  of  real-time  monitoring  of  needs,  requirements  and  movements  in  urban 
environments. 

In  recent  years,  the  label  ‘smart  city’  emerged  as  a  broad  term  for  identify¬ 
ing  not  only  technology  and  smart  infrastructure  issues,  but  also  strategies 
suitable  to  address  societal  problems  generated  by  uncontrolled  urbanisation 
and  population  growth  in  cities.  Smart  city  strategies  rely  upon  the  Inter¬ 
net  and  Web  2.0  technologies  to  deal  with  several  challenges,  such  as  urban 
welfare,  quality  of  life,  societal  participation  and  environmental  sustainability 
(Schaffers  et  al.,  2010).  In  the  literature,  many  other  smart  city  definitions 
may  be  found  concerning  different  elements  that  contribute  to  the  success  of 
such  initiatives.  ICT  represents  the  fundamental  element  to  improve  urban 
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Fig.  6:  Comparing  Geodesign  syntheses  in  the  Cagliari  (Italy)  metro  area  at  a  Geodesign  workshop  in  2016  with  geodesignhub.com. 
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livability  and  sustainability,  as  well  as  to  ensure  the  integration,  efficiency  and 
connections  in  the  network  of  urban  infrastructure  and  services  (Washburn 
and  Sindhu,  2009).  However,  technology  is  also  intended  to  foster  the  spatial 
enablement  of  citizens  by  improving  the  access  to,  and  the  sharing  and  inte¬ 
gration  of,  spatial  data  within  urban  services  (Roche  et  al.,  2012). 

Nevertheless,  the  technological  advances  offered  by  ICT  are  not  the  only  key 
elements  leading  to  the  success  of  smart  city  strategies,  which  also  depends  on 
the  managerial,  political  and  contextual  dimensions  of  a  city  (Nam  and  Pardo, 
2011).  Several  factors  of  the  political  dimension,  such  as  governance,  policy 
and  local  community,  may  play  a  central  role  in  the  development  of  such  strat¬ 
egies.  Indeed,  many  stakeholders  are  involved  in  the  implementation  of  smart 
city  strategies,  and  tight  relationships  between  these  actors  are  fundamental 
to  ensure  the  exchange  of  knowledge  in  order  to  avoid  the  failure  of  projects 
(Scholl  et  al.,  2009).  At  the  same  time,  local  communities  play  a  fundamental 
role  in  defining  smart  city  strategies  by  taking  into  account  their  own  needs  and 
opinions  in  order  to  guarantee  transparency,  democracy  and  pluralism  while 
avoiding  negative  effects  on  their  quality  of  life. 

In  light  of  the  above  considerations,  the  participation  of  local  actors  and  peo¬ 
ple  should  represent  an  essential  factor  for  tailoring  successful  smart  city  initia¬ 
tives.  In  this  regard,  the  unprecedented  wealth  of  digital  GI,  namely  SMGI  and 
VGI,  supplies  insights  not  only  about  opinions,  needs,  perceptions  and  move¬ 
ments  of  local  communities  in  the  urban  environment  but  also  about  design 
requirements  and  strategies,  and  may  result  in  unprecedented  opportunities 
for  leading  the  development  of  smart  city  strategies,  taking  into  account  the 
real  requirements  of  multiple  stakeholders  and  of  the  local  community  and  the 
people  living  in  a  place. 


5  Conclusions 

To  conclude,  let  us  remind  ourselves  of  the  concept  of  the  Right  to  the  city, 
addressed  by  D.  Harvey  (2008:  23)  as  follows:  ‘The  right  to  the  city  is  far  more 
than  the  individual  liberty  to  access  urban  resources:  it  is  a  right  to  change  our¬ 
selves  by  changing  the  city.  It  is,  moreover,  a  common  rather  than  an  individual 
right  since  this  transformation  inevitably  depends  upon  the  exercise  of  a  collec¬ 
tive  power  to  reshape  the  processes  of  urbanisation.  The  freedom  to  make  and 
remake  our  cities  and  ourselves  is,  I  want  to  argue,  one  of  the  most  precious  yet 
most  neglected  of  our  human  rights’. 

As  shown  in  this  chapter,  it  is  realistic  to  foresee  broader  and  pluralist  knowl¬ 
edge  of  the  places  enclosed  in  VGI  and  SMGI  in  the  near  future.  This  knowl¬ 
edge  might  be  proficiently  used  by  developing  advanced  technological  solu¬ 
tions  that  integrate  official  and  experiential  information  with  an  urban  sensor 
data  infrastructure,  fostering  the  implementation  of  strategies  informed  and 
supported  by  local  communities  in  a  bottom-up  approach. 
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Such  an  approach  must  not  be  seen  as  beneficial  only  for  citizens,  but  also 
equally  for  the  authorities  at  different  levels,  and  in  particular  for  the  decision¬ 
makers  who  may  one  day  rely  upon  VGI  and  SMGI  to  discriminate  among 
different  alternatives,  paying  specific  attention  to  the  concerns  of  users  and 
selecting  among  the  solutions  that  will  satisfy  the  requirements  of  involved 
stakeholders.  VGI  and  SMGI  may  also  foster  scenarios  where  city  planners  are 
able  to  listen  to  the  local  community’s  concerns  and  preferences,  eventually 
interacting  with  the  community  through  new  technologies  and  communica¬ 
tion  channels  to  design  alternative  projects  and  to  assess  future  development 
options  through  a  constructive  and  participatory  dialogue.  This  may  sound 
rather  like  a  distant  promise,  but  it  represents  a  possible  future  development  in 
spatial  and  urban  planning  and  design,  thus  contributing  to  finally  making  the 
concept  of  the  right  to  the  city  a  realised  one. 
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Abstract 

This  chapter  explores  growing  and  important  trends  within  citizen  sensing, 
especially  those  linked  to  major  initiatives  that  form  citizens’  observatories  and 
address  novel  ways  to  engage  citizens  in  science  and  environmental  policy¬ 
making.  On  the  basis  of  providing  an  overview  of  existing  and  planned  citizen 
science  and  citizens’  observatories  programmes,  this  chapter  identifies  areas 
where  citizen  science  and  citizens’  observatories  have  actively  contributed  to, 
and  can  be  expected  to  see  further  development  in,  the  formation  of  various 
policies  in  Europe.  Furthermore,  this  chapter  considers  the  motivations  for 
developing  citizen  science  and  citizens’  observatories  and  how  these  initiatives 
can  contribute  to  awareness  raising  and  decision  support  systems.  We  address 
key  challenges  and  development  needs  for  policy-  and  decision-making  within 
the  context  of  widespread  and  accessible  citizen  science  and  of  the  activities  of 
citizen  observatories. 
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1  Citizen  Science  and  Citizens’  Observatories:  A  Growing 
and  Important  Trend  to  Engage  Citizens  in  Science  and 
Environmental  Policy-making 

The  participation  of  citizens  in  environmental  monitoring  and  related  scientific 
activities  has  a  long  tradition,  dating  back  at  least  two  centuries  (Silvertown, 
2009;  UWE,  2013).  The  present  digital  era  facilitates  people’s  easy  access  to 
advanced  Information  and  Communication  Technology  (ICT)  systems  (e.g., 
social  media  platforms,  mobile  Internet,  online  gaming  or  smartphone  apps, 
etc.),  enabling  the  public  to  participate  in  (scientific)  projects  on  issues  relevant 
to  their  local  environment  and  to  easily  access  data  and  information  about  the 
state  of  those  data.  The  collaborative  power  of  these  advanced  ICT  systems  is 
enormous,  and  can  leverage  a  collective  intelligence  that  has  the  potential  to 
change  the  way  environmental  policy-making  and  monitoring  is  performed, 
as  well  as  more  effectively  raise  citizens’  awareness  of  environmental  issues. 
Numerous  collaborative  and  co-design  approaches  have  been  developed  and 
tested  during  the  last  decades.  In  this  chapter,  we  will  focus  on  two  methodolo¬ 
gies  that  are  well  suited  to  be  applied  in  the  context  of  ‘Mapping  and  the  citizen 
sensor’:  Citizen  Science  (CS)  and  Citizens’  Observatories  (COs),  which  both 
have  applicability  in  the  acquisition  of  spatial  data  through  Volunteered  Geo¬ 
graphic  Information  (VGI). 

In  this  section,  we  first  define  our  terms  (CS  and  CO)  and  discuss  how 
these  methodologies  have  become  increasingly  vital  within  science  and 
policy-making  (Sections  1.1  and  1.2).  We  then  distinguish  between  CS  and 
COs  in  general,  and  also  especially  in  relation  to  major  CS  and  CO  initiatives 
that  engage  citizens  in  science  and  environmental  policy-making. 


1.1  Citizen  Science:  old  wine  in  new  bottles 

Before  diving  directly  into  the  world  of  CS,  let  us  first  review  its  definition.  Gen¬ 
erally,  the  term  describes  the  activities  of  non-scientist  citizens  that  contribute 
to  scientific  research.  In  the  Oxford  dictionary,  we  find  the  following  definition: 
‘scientific  work  undertaken  by  members  of  the  general  public,  often  in  collabo¬ 
ration  with  or  under  the  direction  of  professional  scientists  and  scientific  insti¬ 
tutions’  (OED,  2014).  CS  approaches  are  also  described  as  Public  Participation 
in  Scientific  Research  (PPSR).  PPSR  describes  all  efforts  of  lay  people  directed 
towards  their  involvement  into  scientific  research  activities  (Shirk  et  al.,  2012); 
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it  includes  CS,  but  augments  it  with  a  broader  definition  of  participation,  not 
only  limited  to  collecting  scientifically  relevant  data.  However,  these  definitions 
do  not  provide  any  information  on  the  extent  to  which  citizens  are  involved  in 
the  scientific  work,  whether  they  are  only  collecting  data  or  whether  they  also 
participate  in  the  creation  of  the  study.  Based  on  relevant  literature,  we  have 
created  an  overview  of  the  most  prominent  categories  of  public  participation  in 
scientific  research  (Figure  1,  adapted  from  Bonney  et  al,  2009)  and  visualised 
a  range  of  popular  terms  that  are  used  in  this  context  in  a  cloud  tag  (Figure  2). 

Why  do  we  need  CS?  CS  offers  many  advantages.  Due  to  restricted  time  and 
limited  monetary  resources,  scientists  cannot  always  collect  large  amounts  of 
data  or  cover  big  geographic  areas  for  both  data  collection  and  documentation 
(Dickinson  et  al.,  2010;  Tulloch  et  al.,  2013).  For  this  reason,  the  help  of  vol¬ 
unteers  in  collecting  data  can  be  extremely  valuable.  For  example,  since  the  US 
Weather  Service  did  not  have  enough  resources  to  set  up  a  countrywide  mete¬ 
orological  measuring  network,  they  made  use  of  volunteers  all  over  the  country 
to  help  in  the  data  collection.  The  resultant  data  were  one  of  the  most  important 
long-term  datasets  in  the  history  of  North  America  and  have  been  used  for 
essential  work  within  climate  research,  agriculture  and  development  planning 
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Fig.  1:  Categories  of  citizen  science.  Modified  from  Grossberndt  and  Liu  (2016). 
All  rights  reserved  ©  Springer  International  Publishing  Switzerland  2016’. 
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Fig.  2:  Cloud  tag  visualising  terms  related  to  citizen  science. 


(Vetter,  2011).  This  example  shows  that  the  collection  of  data  over  many  dec¬ 
ades  has  led  to  the  compilation  of  long-term  data  series,  which  are  extremely 
valuable  for  the  work  of  modern  science  (Miller-Rushing  et  al.,  2012). 

Another  reason  for  the  application  of  CS  and  other  participatory  approaches 
is  to  increase  citizens’  awareness  of  problems  related  to  their  immediate  envi¬ 
ronment.  In  some  cases,  the  activities  can  also  result  in  greater  interest  and 
increased  engagement  in  these  issues.  Engaging  citizens  can  also  have  educa¬ 
tional  effects  and  increase  science  literacy  (Haklay,  2015). 

One  would  think  that  CS  was  a  rather  novel  invention,  considering  that  many 
scientists  prefer  to  keep  to  themselves  in  their  ivory  towers  and  the  concept  of 
public  participation  is  only  gradually  making  its  way  into  their  thinking.  Sur¬ 
prisingly  enough,  the  roots  of  CS  can  be  traced  at  least  as  far  back  as  the  18th 
century.  At  this  time,  a  Norwegian  bishop  engaged  a  large  number  of  clergy¬ 
men  throughout  the  whole  country  and  assigned  them  with  the  task  of  collect¬ 
ing  observations  and  natural  objects  from  all  over  Norway  in  order  to  assist 
him  in  his  research  (Brenna,  2011).  Throughout  the  centuries,  non-scientists/ 
laypeople  have  often  been  engaged  in  assisting  scientists  in  the  collection  of 
data.  Another  more  recent  example  is  the  traditional  Christmas  Bird  Count 
in  the  USA,  Canada  and  other  Western  countries  that  began  in  1900:  in  the 
2014/15  season,  more  than  72,000  volunteers  participated  in  that  programme 
(LeBaron,  2015). 

Nowadays,  a  large  number  of  CS  activities  have  been  initiated  and  are  still 
ongoing,  covering  many  different  fields  (see  Section  2.2).  The  list  of  CS  pro¬ 
grammes  is  endless,  and,  during  the  last  decades,  CS  activities  have  sprung 
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up  like  mushrooms  all  around  the  globe.  What  has  caused  this  phenomenon? 
There  are  several  reasons.  First  and  foremost,  there  have  been  rapid  changes 
within  ICT;  for  example,  easy  Internet  access,  the  emergence  of  Web  2.0  sys¬ 
tems  and  the  rise  of  social  media  have  enabled  increased  engagement  with  the 
public.  Another  aspect  is  the  improvement  and  simplification  of  the  collection, 
management  and  storage  of  data.  More  and  more  people  have  access  to  easy-to- 
use  devices  like  smartphones  and  other  mobile  devices  with  GPS  positioning 
technology;  this  facilitates  the  involvement  and  connection  of  citizens  around 
the  world.  Collecting  data  or  taking  a  picture  and  sending  it  to  a  data  server 
with  the  exact  time  and  geographic  position  now  takes  split  seconds,  not  hours 
or  days.  A  second  important  reason  for  the  emergence  of  CS  initiatives  is  the 
changes  in  society.  At  least  in  Western  countries,  the  level  of  education  amongst 
the  public  has  been  increasing.  More  leisure  time  and  a  growing  understanding 
of  scientific  concepts,  as  well  as  increased  technical  skills,  even  for  the  young¬ 
est  in  society,  are  contributing  factors  to  CS  initiatives.  Thirdly,  scientists  have 
become  more  aware  of  the  fact  that  citizen  participation  in  the  collection  of 
scientific  data  can  also  be  beneficial,  due  to  resource  limitations,  as  mentioned 
above.  Recent  study  results  indicate  that  savings  in  labour  cost  per  project  can 
reach  up  to  US$200,000  over  the  project’s  first  180  days,  depending  on  the  pro¬ 
ject  (Sauermann  and  Franzoni,  2015). 


1.2  Citizens’  Observatories:  A  New  Concept 

As  early  as  the  1970s,  P.K.  Feyerabend  suggested  that  it  was  time  for  a  democ- 
ratisation  of  science;  he  claimed  that  everywhere  science  is  enriched  by  unsci¬ 
entific  methods  and  unscientific  results’  (Feyerabend,  1970).  Essentially,  he 
believed  that  the  monopolisation  of  research  by  universities,  corporations  and 
other  large  institutions  was  contrary  to  the  best  interest  of  science,  which,  as 
we  have  seen,  has  a  long  history  of  public  participation.  However,  in  spite  of 
his  attempts  to  redress  the  lack  of  citizens  or  non-scientists  within  research, 
amateur  participation  was  declining.  This  deficit  was  eventually  recognised, 
and,  in  order  to  promote  a  more  active  participation  from  the  public,  the  EU 
first  commissioned  the  SOCIENTIZE  project  (2012-2014),  to  create  a  com¬ 
mon  forum  for  cooperation  between  e-Infrastructure  providers  and  CS  infra¬ 
structure  providers,  including  any  end  user  with  an  interest  in  contributing  to 
the  scientific  process  (Socientize,  2012-2014).  The  project  produced  the  Green 
Paper  on  Citizen  Science,  which  helped  to  create  a  roadmap’  for  CS  in  Europe. 
This  led  to  a  series  of  further  initiatives  where  CS  was  incorporated  in  some 
form,  especially  within  the  development  of  the  new  concept  of  the  CO  (see 
Section  2.1). 

The  term  CO  was  first  addressed  in  the  EU  FP7  Topic  ENV.2012.6.5-1: 
‘Developing  community-based  environmental  monitoring  and  information 
systems  using  innovative  and  novel  earth  observation  applications’  (EC,  2014). 
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It  is  a  term  that  is  applied  to  a  framework  that  combines  participatory  com¬ 
munity  monitoring  with  monitoring  by  policy-makers,  scientists  and  other 
stakeholders.  Typically,  this  is  achieved  via  a  technological  system  that  may 
include  web  portals,  mobile  technologies  and  sensors  (Liu  et  al.,  2014).  The 
term  was  further  developed  within  five  projects  that  were  funded  within  the  EU 
FP7  Topic  ENV.2012. 6.5-1  (see  Section  2.1).  For  example,  in  the  CITI-SENSE 
project,  a  CO  for  supporting  community -based  environmental  governance  has 
been  defined  as  ‘the  citizens’  own  observations  and  understanding  of  environ¬ 
mentally  related  problems  and  in  particular  as  reporting  and  commenting  on 
them  within  a  dedicated  ICT  platform  (Liu  et  al,  2014)  and  was  tested  in  nine 
cities  in  the  field  of  air  quality.  In  the  WeSenselt  project,  Ciravegna  et  al.  (2013) 
defined  a  CO  as  ‘a  method,  an  environment  and  an  infrastructure  supporting 
an  information  ecosystem  for  communities  and  citizens,  as  well  as  emergency 
operators  and  policymakers,  for  discussion,  monitoring  and  intervention  on 
situations,  places  and  events.’  The  CO  in  the  WeSenselt  project  is  therefore  seen 
as  an  environment  for  implementing  collaboration,  as  infrastructure  to  validate 
the  CO  concept  and  as  a  method  to  demonstrate  the  applicability  of  its  out¬ 
come  (Lanfranchi  et  al.,  2013). 

There  is  no  doubt  that  the  term  CO  has  become  popular  in  CS  programmes 
(especially  EU-funded  ones),  and  many  new  CO-related  initiatives  have  been 
created  at  different  levels.  Accordingly,  this  new  term  represents  a  growing  and 
important  trend  in  both  science  and  policy-making. 

In  practice,  all  CO  projects  typically  share  a  similar  model,  including  the 
main  aspects  needed  to  develop  COs  as  a  method  for  data  collection.  These 
include  engaging  the  participation  of  citizens  in  data  collection,  data  interpre¬ 
tation  and  information  delivery.  Alternatively,  the  CO  model  (Figure  3)  com¬ 
bines  (i)  sequential  aspects,  (ii)  interaction  with  citizens  and  other  stakehold¬ 
ers,  (iii)  data  collection  tools,  and  (iv)  an  ICT  infrastructure  that  underlies  the 
CO  framework  and  supports  effective  citizen  participation. 

A  set  of  sequential  aspects  (the  pyramid  within  Figure  3)  has  been  identi¬ 
fied  by  Liu  et  al.  (2014)  as  follows:  A)  identifying  what  citizens  want  and  what 
citizens  can  offer;  B)  exploring  what  products  and  services  a  CO  can  provide 
for  the  citizens;  C)  recruiting  and  retaining  citizens  to  participate  in  and  con¬ 
tribute  to  environmental  governance;  D)  providing  tools  that  support  citizens 
to  report  their  observations,  inferences  and  concerns;  and  E)  supplying  tools  to 
access/receive  information  on  the  environment  in  a  manner  that  is  both  easily 
understood  and  useful,  for  citizens  and  other  stakeholders,  including  policy¬ 
makers. 

The  essential  aspects  of  the  interaction  with  citizens  and  other  stakehold¬ 
ers  (who  are  represented  by  the  five  circles  along  the  bottom  outer  open  edge 
in  Figure  3)  have  been  addressed  in  all  existing  CO  models.  A  CO  includes 
observations  from  not  just  professionals  and  scientists,  but  also  citizens. 
An  effective  CO  shall  enable  a  two-way  communication  between  citizens 
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Fig.  3:  A  common  model  for  citizens’  observatories  programmes  in  practice.  Modified  from  Grossberndt 
and  Liu  (2016).  All  rights  reserved  ©  Springer  International  Publishing  Switzerland  2016. 
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and  other  stakeholders,  potentially  resulting  in  profound  changes  to  local 
environmental  management  processes,  and,  as  such,  shall  engage  in  social 
innovation  processes  and  outcomes  (Wehn  and  Evers,  2015).  For  example, 
the  WeSenselt  project  used  social  media  and  co-design  approaches,  explor¬ 
ing  citizens’  needs  and  providing  a  framework  in  which  authorities  and  citi¬ 
zens  cooperate  in  sharing  collective  intelligence  and  participate  in  planning, 
decision-making  and  governance  regarding  the  water  environment,  includ¬ 
ing  flood  risk  (WeSenselt,  2012-2016). 

The  data  collection  tools  (the  two  ovals  along  the  outer  open  edge  of 
Figure  3)  are  highlighted  in  the  existing  CO  models  as  well.  For  example, 
the  CITI-SENSE  project  engages  citizens  to  use  low-cost  micro-sensors  to 
monitor  air  quality  in  their  surroundings  (hard  layer  of  data  collection),  and 
interacted  with  citizens  via  various  social  media  and  mobile  apps  (soft  layer 
of  data  collection;  CITI-SENSE,  2012-2016). 

The  ICT  infrastructure  (the  large  oval  at  the  top  of  Figure  3)  is  an  essential 
part  of  the  CO  model  that  includes  boundary  services  with  sensors  and  apps, 
data  management  services,  data  storage  support  and  the  reusable  visualisation 
widgets  used  for  both  apps  and  web  portals.  Currently,  existing  CO  projects 
are  building  all  required  ICT  infrastructure  towards  a  systematic,  simple  and 
reusable  method  to  facilitate  the  setting  up  of  new  COs  in  various  environmen¬ 
tal  fields,  a  method  which  can  be  applied  by  communities  and  organisations  to 
overcome  their  challenges  regarding  the  specific  technical  ICT  skills  and  pro¬ 
gramming  knowledge  needed  to  create  the  necessary  server  infrastructure  and 
mobile  applications  (Zaman  et  al.,  2014). 


1.3  Citizen  Science  and  Citizens’  Observatories  - 
Commonalities  and  Differences 

As  mentioned  previously,  CS  is  a  novel  take  on  an  old  approach  and  is  generally 
described  as  ‘public  participation  in  scientific  research’.  COs  are  a  new  concept 
that  evolved  from  EU  policy  circles,  defining  the  combination  of  participatory 
community  monitoring,  technology  and  governance  structures  that  are  needed 
to  monitor,  observe  and  manage  an  environmental  issue  (Haklay,  2015). 

Both  CS  and  COs  involve  citizens  in  scientific  research  or  various  monitoring 
programmes,  help  citizens  to  play  an  active  role  in  the  data  collection  process 
and  enable  them  to  exchange  data/information  and  knowledge,  to  reach  the 
expert  who  can  answer  questions  about  various  issues  that  are  being  addressed, 
and  to  disseminate  information  to  further  the  understanding  of  such  issues. 
The  Chinese  proverb  ‘Tell  me  and  I’ll  forget;  show  me  and  I  may  remember; 
involve  me  and  I’ll  understand’  is  an  apt  quotation  in  this  context,  since  both 
CS  and  COs  have  great  potential  to  be  a  suitable  instrument  to  raise  awareness, 
increase  citizen  participation  and  support  community-based  environmental 
decision-making. 
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Whereas  CO  approaches  focus  very  much  on  a  two-way  communication 
between  citizens  and  other  stakeholders,  such  as  scientists,  this  may  not 
always  be  the  case  for  CS:  here,  the  degree  of  participation  can  vary  from 
only  collecting  data  to  participating  in  the  study  design  and  data  analysis.  In 
addition,  CS  usually  refers  to  science/scientific  projects,  whereas  COs  include 
a  broad  range  of  stakeholders,  including  authorities  or  policy-makers.  How¬ 
ever,  the  combination  of  both  top-down  and  bottom-up  approaches  makes 
COs  a  more  complex  tool,  especially  as  they  require  an  ICT  infrastructure, 
which  is  not  necessarily  required  for  CS  initiatives. 


2  Current  Citizen  Science  and  Citizens’  Observatory 
Programmes  in  Europe 

2. 1  Citizens’  Observatory  Projects 

In  recent  years,  there  have  been  many  ongoing  COs  projects  in  Europe.  For 
example,  the  European  Commission  (EC)  has  seen  the  possibility  of  empowering 
European  citizens  in  environmental  monitoring,  with  the  consequent  increase  in 
observational  possibilities.  The  EC  has  provided  funding  through  their  Seventh 
Framework  Programme  for  five  projects  (i.e.,  Citclops,  CITI-SENSE,  COBWEB, 
OMNISCIENTIS  and  WeSenselt)  with  the  aim  of  building  COs  in  the  various 
environmental  fields.  For  example,  OMNISCIENTIS  has  combined  the  active 
participation  of  citizens  with  the  implementation  of  innovative  technologies  for 
improving  the  governance  of  odour  nuisance  (OMNISCIENTIS,  2012-2014). 
Other  projects  that  emphasise  the  need  for  citizens’  participation  are  COBWEB, 
which  aimed  at  creating  a  test-bed  environment  that  would  enable  citizens 
living  within  Biosphere  Reserves  to  collect  environmental  data  using  mobile 
devices  (COBWEB,  2012-2016;  Higgins  et  al.,  2016);  Citclops,  which  aimed  at 
developing  an  observatory  based  on  CS  applications  for  bio-optical  monitoring 
of  coast  and  ocean  (Ceccaroni  et  al.,  2016;  Citclops,  2012-2015);  and  WeSen¬ 
selt,  which  puts  emphasis  on  enabling  citizens  to  become  active  stakeholders  in 
information  capturing,  evaluation  and  communication  for  the  marine  environ¬ 
ment,  including  flood  risk  (WeSenselt,  2012-2016).  Finally,  CITI-SENSE  aimed 
at  empowering  citizens  to  participate  in  environmental  governance  by  develop¬ 
ing  various  CO  supporting  services  related  to  outdoor  air  quality,  indoor  air 
quality  in  schools  and  environmental  perception  in  public  spaces  (CITI-SENSE, 
2012-2016).  These  five  CO  projects  were  designed  independently  of  each  other; 
however,  they  had  considerable  similarities  in  terms  of  their  structure,  opera¬ 
tion  and  methodology  for  communication  with  the  public  (Liu  et  al.,  2014). 
Furthermore,  there  has  been  cross-project  collaboration  amongst  these  five  pro¬ 
jects  to  (i)  facilitate  data,  knowledge  and  success  sharing  amongst  the  projects, 
and  (ii)  establish  common  methodologies  and  standards  for  crowdsourcing/ 
citizen  science  within  GEOSS  and  aligned  with  INSPIRE  and  Copernicus1. 
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In  addition,  four  projects  have  been  funded  under  the  EC  H2020  topic 
SC5- 17-2015,  ‘Demonstrating  the  concept  of  “Citizen  Observatories’”  (EC, 

2015- 2016),  that  aim  to  scale  up,  demonstrate,  deploy,  test  and  validate,  under 
real-world  conditions,  the  concept  of  CO  and  the  effective  transfer  of  envi¬ 
ronmental  knowledge  for  policy,  industrial,  research  and  societal  use,  with  a 
focus  on  the  domain  of  land  cover/land  use,  both  in  rural  and  urban  areas.  The 
EC  H2020  topic  CSA-2017  (‘Coordination  of  Citizens’  Observatories  initia¬ 
tives’;  EC,  2016-2017b)  aims  at  bringing  existing  CO  and  related  communities 
together,  and  also  the  EC  H2020  topic  RIA-2017  (‘Novel  in-situ  observation 
systems’)  will  further  develop  ICTs  and  test  them  in  various  CO  activities  (EC, 

20 16- 20 17a). 

Furthermore,  with  an  increasing  number  of  CO-based  initiatives,  the  EU 
H2020  Work  Programme  2016-2017  (Topic  in  SC5-19-2017)  issued  a  call  for 
the  coordination  of  citizens’  observatories  initiatives  (EC,  20 16-20 17b)  to  cre¬ 
ate  a  CO  knowledge  base  in  Europe  across  disciplines  in  order  to  avoid  duplica¬ 
tion,  ensure  interoperability,  create  synergies  and  facilitate  the  gradual  uptake 
of  this  knowledge  base  by  environmental  authorities. 

There  are  more  existing  and  planned  CO-related  activities  supported  by  the 
EC  programmes  and  calls,  for  example: 

•  CAPS  -  Collective  Awareness  Platforms  for  Sustainability  and  Social  Inno¬ 
vation  (ICT  Calls;  34  existing  projects  (EC,  2016)); 

•  The  new  calls  in  2016-2017  (EC,  2016-2017c)  and  Pilots  and  Coordination 
and  Support  Actions; 

•  Integrating  Society  in  Science  and  Innovation  (RRI)  (EC,  2017); 

.  MYGEOSS  (EC,  2015);  and 

•  RIA  -  Novel  in-situ  observation  systems  (EC,  2016-2017a),  etc. 


2.2  Citizen  Science  Projects 

In  recent  years,  there  has  been  a  boom  in  CS  projects,  with  many  now  har¬ 
nessing  new  technologies,  such  as  mobile  Internet  and  smartphone  apps, 
to  increase  accessibility  and  remote  participation.  For  example,  more  than 
1,600  formal  and  informal  research  projects,  tools  and  events  are  listed  on 
SciStarter  and  the  number  is  increasing  rapidly  (SciStarter  2017).  Some  of  the 
best  known  projects  were  and  are  run  by  the  previous  Zooniverse  team,  now 
Citizen  Science  Alliance,  which  launched  the  Galaxy  Zoo  galaxy- classifying 
project  in  2007  (Zooniverse,  2013),  and  whose  crowdsourcing  model  has  been 
adopted  by  many  other  groups.  However,  there  are  many  more  examples  of  CS 
projects,  which  include,  but  are  certainly  not  limited  to,  topics  such  as  biologi¬ 
cal  monitoring  (e.g.,  the  Cornell  Lab  of  Ornithology,  www.birds.cornell.edu; 
the  Great  Backyard  Bird  Count2;  the  big  butterfly  count3,),  geography  (e.g., 
OpenStreetMap4),  air  quality  (e.g..  Air  Quality  Egg5),  and  others  that  encom- 
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pass  different  models  of  CS;  within  the  environmental  sciences,  these  span  a 
diverse  range  of  subjects. 

The  CS  activities  can  differ  in  focus,  approach  or  technique.  Various  reviews 
indicate  that  the  most  prominent  topics  for  CS  are  biology,  conservation  and 
ecology,  with  citizens  assisting  in  the  collection  and  classification  of  data 
(Kviner,  2012;  Science  Communication  Unit,  University  of  the  West  of  Eng¬ 
land,  2013;  Liu  et  al.,  2014;  Grossberndt  and  Liu,  2016).  Another  main  cluster 
is  geographic  information  research,  with  citizens  collecting  geographic  data; 
as  the  third  most  prominent  group  of  CS  topics,  the  study  identified  research 
involving  the  public  in  relation  to  environmental  and  health  issues  (Kullenberg 
and  Kasperowski,  2016).  There  are  also  ‘higher  level’  initiatives,  like  the  Open 
Air  Laboratories  (OPAL)  for  CS  initiatives  focused  on  nature6,  Geo-Wiki  for 
projects  addressing  global  land  cover  issues7  or  Zooniverse,  serving  as  a  hub  for 
projects  from  different  fields8. 

In  Europe,  CS  has  grown  in  scale  and  scope,  and  is  therefore  receiv¬ 
ing  increasing  attention  from  scientists  and  policy-makers  at  local,  national 
and  international  levels.  Some  of  the  well  known  European  CS  projects  are 
ENERGIC9,  EmoMap10  and  Every  Aware11.  Gradually,  CS  has  been  considered 
as  an  independent  discipline.  For  example,  there  are  academic  groups  and  col¬ 
laborations  (Science  Communication  Unit,  University  of  the  West  of  England, 

2013) ,  including  the  Citizen  Cyberlab12,  a  Swiss  partnership  involving  CERN, 
the  UN  Institute  for  Training  and  Research  and  the  University  of  Geneva;  and 
OPAL13;).  Furthermore,  there  are  large-scale  experiments  at  JRC  (EC  JRC, 

2014)  to  (i)  assess  the  quality  of  social  network  data  of  2010-2012  (by  com¬ 
parison  with  official  data  from  EFFIS);  (ii)  map  CS  and  Smart  Cities  projects; 
(iii)  develop  the  typology  of  CS,  set  up  facilities  for  social  media  data  analysis 
and  develop  analytical  tools;  (iv)  set  up  a  framework  for  hosting  citizen  sci¬ 
ence  project  data  (e.g.  CitObs,  EveryAware),  websites  and  code  after  the  end 
of  project;  (v)  develop  interoperability  protocols  and  integration  with  official 
data  sources  (INSPIRE,  Copernicus);  (vi)  develop  partnerships  with  relevant 
stakeholders  (e.g.  ECSA,  2016);  and  (vii)  explore  the  use  of  citizen-generated 
content  to  develop  new  indicators  of  quality  of  life  in  urban  areas,  with  com¬ 
parison  to  official  sources  (e.g.  Eurobarometer). 

3  Citizen  Science  and  Citizens’  Observatories  for 
Policy  and  Decision-Making 

The  increasing  numbers  of  CS  activities  and  the  rise  of  COs  in  recent  years  dem¬ 
onstrates  one  key  fact:  science  needs  public  participation.  We  have  already  stated 
that  the  involvement  of  volunteers  in  the  collection  of  observations  and  data  can 
be  beneficial  for  scientists  who  suffer  from  a  constraint  of  resources.  Another 
advantage  that  we  inevitably  come  across  is  the  fact  that  the  participation  of  citi¬ 
zens  in  science  will  also  serve  the  purpose  of  awareness  raising,  i.e.  that  people 
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become  more  aware  of  problems  or  issues  related  to  their  direct  environment 
and  are  consequently  more  likely  to  be  in  turn  interested  in  the  initiative  and  to 
be  more  willing  to  participate  (Evans  et  at,  2005;  Haklay,  2015).  Several  reviews 
of  CS  and  CO  projects  indicate  that  the  involvement  of  volunteers  in  science 
offers  added  value  to  science  literacy  and  education  effects  (Kviner,  2012;  Science 
Communication  Unit,  University  of  the  West  of  England,  2013;  Haklay,  2015; 
Grossberndt  and  Liu,  2016).  A  review  of  more  than  230  “citizen  science”  projects 
concluded  that  volunteers  have  proven  to  provide  information  that  has  ‘high 
value  to  research,  policy,  and  practice  (Tweddle  et  al.,  2012). 

Although  public  participation  has  been  given  more  attention  in  environmen¬ 
tal  governance  processes  recently,  in  most  places  it  is  still  in  its  infancy.  In  1998, 
the  Aarhus  Convention  strengthened  public  participation  through  the  estab¬ 
lishment  of  ‘the  right  to  know’,  i.e.,  the  access  to  environmental  information, 
public  participation  in  environmental  decision-making  and  access  to  justice 
(UNECE,  1998).  The  EC  Directive  2003/35/EC  was  adopted  in  2003  to  pro¬ 
vide  for  public  participation  and  thus  implement  the  Aarhus  Convention  in  the 
Member  States  of  the  EU  (EC,  2003). 

Involving  citizens,  and  not  only  scientific  experts,  in  environmental  governance 
processes  creates  new  opportunities.  The  EC  published  a  White  Paper  in  2001 
(EC,  2001),  where  they  called  upon  different  actors  for  cooperation  within  the 
whole  process  of  environmental  governance.  The  White  Paper  points  to  decision¬ 
makers  and  scientists  as  actors  of  such  governance,  but  also  requests  explicitly  the 
inclusion  of  representatives  from  civil  society.  In  2014,  the  EU  project  Socientize 
developed  a  White  Paper  on  Citizen  Science  for  Europe  (Socientize,  2012-2014), 
which  aimed  to  support  policy-makers  at  the  European,  national  and  regional 
levels  to  set  up  future  strategies  of  civic  engagement. 

Both  CS  and  COs  can  provide  scientists  with  important  and  reliable  data, 
enabling  authorities  to  carry  out  informed  policy-making,  while  providing 
citizens  with  opportunities  to  address  issues  affecting  them  at  different  scales. 
As  citizens  develop  an  increased  scientific  and  environmental  understanding, 
they  may  begin  to  influence  decision-making  and  policy  through  activities 
such  as  petitions,  public  debate  and  advocacy,  e.g.,  for  identifying  new  policy 
issues,  generating  policy  options,  lobbying,  supporting  joined-up  governance, 
etc.  (Walters  et  al.,  2000).  An  example  of  participatory  monitoring  impacting 
policy  can  be  seen  in  Cambodia,  where  the  Committee  for  Free  and  Fair  Elec¬ 
tions  uses  voter  scorecards  and  volunteers  with  mobile  phones  to  monitor  if 
elected  representatives  keep  their  election  promises.  These  examples  have  a 
direct  impact  on  local  policy  and  are  the  direct  result  of  citizen  participation 
and  observation  (Bottomley,  2014).  However,  many  CS  and  CO  programmes 
have  yet  to  be  evaluated  for  these  impact  attributes. 

As  addressed  earlier  in  this  chapter  (see  Section  1.2),  the  CO  as  a  new  con¬ 
cept  that  considers  the  wider  implications  of  CS  has  evolved  in  EU  policy  cir¬ 
cles.  The  existing  and  planned  CO  projects,  and  the  results  of  their  preliminary 
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testing  in  practice,  indicate  that  COs  have  a  great  potential  to  complement 
in-situ  observation  networks  and  to  contribute  to  European  policies  covering 
areas  from  water  management  and  air  quality  protection  to  biodiversity  con¬ 
servation. 

In  the  ‘Citizen  Science  and  Policy:  A  European  Perspective’  report  (Haklay, 
2015),  the  following  three  policy  dimensions  are  distinguished:  (1)  level  of 
geography;  (2)  policy  domains;  and  (3)  level  of  engagement  and  type  of  CS 
activity.  CS  initiatives  can  influence  policy  decisions  in  a  specific  geographic 
area,  i.e.  local,  regional,  national  or  international.  Usually,  problems  that  affect 
the  direct  environment  lead  to  more  engagement,  since  people  are  more  con¬ 
cerned  (Haklay,  2015).  This  increased  awareness  can  be  leveraged  to  engage 
local  people  to  contribute  to  CS  initiatives.  Local  CS  is  often  linked  to  environ¬ 
mental  activism  and  supports  community  management  by  working  towards 
effective  and  meaningful  management  planning  and  stewardship  (Conrad  and 
Hilchey,  2011).  Local  CS  can  also  apply  the  so-called  community-based  moni¬ 
toring  (CBM)  approach.  CBM  describes  a  process  where  concerned  citizens, 
public  authorities  and  further  stakeholders  collaborate  to  monitor,  track  and 
respond  to  issues  that  arise  from  common  community  concerns  (Whitelaw  et 
al.,  2003). 

There  is  an  increasing  need  for  communities  to  fall  back  on  CS  approaches 
(or  on  CBM  ones)  and  include  different  stakeholders  with  their  diverse  knowl¬ 
edge  and  experience  into  decision-making  processes  (Conrad  and  Daoust, 
2008).  In  addition  to  potential  savings  in  time  and  money  for  decision-making 
bodies,  the  societal  benefits  of  CBM  will  be  to  create  environmental  democ¬ 
racy,  social  capital,  and  an  increase  in  scientific  literacy  and  inclusion  in  local 
issues  (Conrad  and  Hilchey,  2011). 

Policy  areas  can  be  manifold  and  partially  overlapping.  For  example,  city- 
scale  policy  includes  public  transport,  environmental  quality,  education,  infra¬ 
structure  and  public  health.  Thus,  cities  can  be  a  canvas  for  a  potpourri  of 
local  monitoring  activities,  originating  from  different  concerns  but  using  the 
accumulated  data  to  see  the  bigger  picture.  Moving  CS  projects  to  the  regional, 
national  or  even  international  level  is  likely  to  meet  even  more  challenges  than 
there  already  are.  Since  bottom-up  initiatives  usually  dispose  of  limited  budgets 
only,  it  will  be  less  likely  to  find  community  science  approaches  with  an  active 
involvement  of  citizens  in  all  parts  of  the  participation  cycle,  i.e.  citizens  will 
instead  only  be  asked  to  share  observations  or  viewpoints  on  certain  issues. 
Nevertheless,  national  and  even  international  initiatives  including  CS  are  pos¬ 
sible  and  do  exist.  Projects  funded  by  the  EC  and  formations  of  international 
organisations  like  the  European  Citizen  Science  Association14  provide  frame¬ 
works  for  national  initiatives  and  NGOs  to  create  synergies  to  promote  CS 
on  larger  scales  and  to  call  on  international  institutions  such  as  the  European 
Environment  Agency  (EEA)  to  promote  citizen  participation  at  the  interna¬ 
tional  level  as  well  (Haklay,  2015). 
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At  present,  there  are  still  relatively  few  CS  and  CO  examples  that  demon¬ 
strate  where  such  projects  have  had  a  clear  and  distinct  impact  on  both  pol¬ 
icy-  and  decision-making.  However,  this  is  dependent  on  how  one  perceives 
the  ‘level’  of  impact.  Monitoring  projects  may  not  bring  about  immediate 
policy  change,  but  their  usefulness  in  building  up  evidence  bases  is  invalu¬ 
able.  For  example,  the  UK  Biodiversity  Indicators  rely  directly  on  the  long¬ 
term  data  that  NGOs  and  their  volunteers  collect  for  species  such  as  birds 
and  butterflies.  These  biodiversity  indicators  feed  directly  into  wider  UK  and 
global  policy,  such  as  the  Convention  on  Biological  Diversity  Strategic  Plan 
for  Biodiversity  20 11-2020.  Other  projects  that  focus  on  observing  and  iden¬ 
tifying  invasive  species,  for  example  PlantTracker  and  the  Harlequin  Ladybird 
Survey,  are  valuable  and  will  become  increasingly  relevant  to  policies  in  this 
area,  such  as  the  recently  proposed  EU  Regulation  on  Invasive  Alien  Species 
and  the  developing  of  tree  health  policies  within  the  UK  (British  Ecological 
Society,  2013). 

Both  CS  and  COs  have  an  extremely  important  role  to  play  in  todays  envi¬ 
ronmental  science  and  research,  and,  through  modern  technology,  innovative 
projects  and  new  partnerships,  the  involvement  of  the  public  will  only  increase. 
The  role  of  CS  and  CO  projects  in  policy  is  relatively  hard  to  gauge,  but  they 
are  invaluable  for  building  up  evidence  bases  and  directing  change  -  especially 
those  projects  that  are  linked  to  some  pressure  groups  (i.e.  a  group  that  tries 
to  influence  public  policy  in  the  interest  of  a  particular  cause)  or  that  address 
environmental  issues  at  the  population  level.  Equally,  given  the  educational  val¬ 
ues  that  citizen  projects  can  provide,  such  projects  maybe  influencing  people’s 
mindsets,  which  in  turn  could  influence  policy  decisions  in  ways  that  are  more 
abstract.  As  such,  people  really  are  power,  not  just  for  science  but  for  policy¬ 
making  too  (British  Ecological  Society,  2013). 


4  Challenges  and  Development  Needs 

As  we  have  seen  in  this  chapter  so  far,  the  idea  of  citizens  participating  in  envi¬ 
ronmental  governance  is  found  not  only  in  citizens’  initiatives,  but  also  at  the 
international  level,  with  e.g.  the  EU  or  UN  as  driving  forces.  However,  there 
is  still  a  discrepancy  between  theory  and  practice,  owing  to  different  circum¬ 
stances  and  challenges.  We  shall  now  look  a  bit  closer  into  the  challenges  that 
are  connected  to  the  implementation  of  CS  and  COs  in  environmental  govern¬ 
ance. 

In  this  section,  we  distinguish  between  four  different  categories  of  challenges: 

•  Technologies  and  data; 

•  Citizen  engagement; 

•  Policies  and  framework; 

•  Additional  requirements  for  COs. 
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4.1  Technologies  and  Data 

CS  approaches  and  COs  require  strict  data  management.  In  both  cases,  vol¬ 
unteers  who  do  not  necessarily  possess  the  required  skills  for  the  collection 
process  can  still  gather  large  amounts  of  data;  however,  the  obtained  data  often 
contain  errors  and  bias.  It  takes  time  and  resources  to  train  the  volunteers  to 
enable  them  to  collect  data  in  the  manner  and  of  a  quality  that  is  useful  for 
scientists,  decision-makers  and  other  stakeholders  (Conrad  and  Hilchey,  2011; 
Dickinson  et  al.,  2010;  Hngelken-Jorge  et  al.,  2014;  Goodchild  and  Li,  2012; 
Hanahan  and  Cottrill,  2004).  An  insufficient  experimental  design  can  hence 
lead  to  undesired  outcomes  (Conrad  and  Hilchey,  2011).  Another  requirement 
is  the  management  and  analysis  of  the  continuously  increasing  volume,  vari¬ 
ety  and  velocity  of  the  data  that  are  collected  throughout  the  whole  course  of 
the  initiative  (Zikopoulos  et  al.,  2011).  One  option  to  deal  with  this  issue  is  to 
build  networks  with  other  existing  projects  or  initiatives  to  use  already  existing 
datasets  and  combine  them  with  newly  obtained  data  (Dickinson  et  al.,  2010). 
However,  special  attention  must  be  paid  to  accuracy  and  uncertainty,  especially 
when  comparing  crowdsourced  with  referenced  data.  The  same  applies  for  the 
interpretation  of  qualitative  data;  indicators  such  as  ‘quality  of  life’  or  ‘wellbe¬ 
ing’  should  be  developed  together  with  more  quantitative  data.  In  addition, 
data  security  and  privacy  are  important  issues  that  require  special  attention. 
Especially  when  using  smartphones  and/or  mobile  sensing  devices,  it  has  to  be 
ensured  that  the  data  from  the  volunteers  are  anonymised  and  treated  accord¬ 
ing  to  national  and  international  data  protection  laws  and  standards.  In  addi¬ 
tion,  ethical  restrictions  may  apply  (Liu  et  al,  2014).  Increasing  the  amount  of 
data  requires  progressive  technologies  and  data  analysis  methods  that  reduce 
measurement  uncertainties  through  real-time,  reliable  and  fast  quality  assur¬ 
ance/quality  control  tools.  Furthermore,  there  is  an  urgent  need  to  explore  and 
develop  technologies  for  data  collection  and  analysis  by  building  the  techni¬ 
cal  capacity  required  to  combine  environmental  monitoring  with  the  exchange 
and  integration  of  different  types  of  data,  then  visualise  and  communicate  the 
results  to  end  users  (Liu  et  al.,  2014;  DFID,  2008). 

The  evaluation  of  citizen  science  and  especially  of  CO  approaches  is  another 
topic  that  requires  further  research.  Indicators  for  evaluation  and  value  propo¬ 
sition  have  to  be  developed  to  facilitate  the  comparison  of  initiatives  from  dif¬ 
ferent  fields  and  their  effectiveness/ efficiency,  especially  regarding  engagement 
and  participation. 


4.2  Citizen  Engagement 

Engaging  with  volunteers  to  participate  in  any  form  of  activity  related  to  CS  or 
COs  can  be  quite  challenging.  The  most  crucial  task  is  to  raise  the  interest  of  the 
volunteers  to  actively  participate  and  continue  until  the  end  of  the  initiative. 
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If  there  is  no  interest,  there  will  be  no  data.  In  addition,  few  people  will  spend 
their  spare  time  and  resources  for  nothing;  the  volunteers  must  clearly  know 
what  to  expect  in  return,  i.e.  what  is  in  it  for  them.  Thus,  it  is  essential  to  imple¬ 
ment  various  tailor-made  tools  to  recruit  and  sustain  citizen  participation  in 
environmental  monitoring  activities  (Fernandez-Gimenez  et  al,  2008;  Conrad 
and  Hilchey,  2011).  One  of  the  preconditions  for  successful  involvement  of  vol¬ 
unteers  in  CS  activities  is  their  level  of  interest  in  the  research  itself.  Neverthe¬ 
less,  many  volunteers  seem  to  contribute  very  little  at  the  beginning  of  data 
collection  activities,  leaving  a  rather  small  amount  of  volunteers  contributing 
the  most  (Sauermann  and  Franzoni,  2015).  Thus,  keeping  the  volunteers’  inter¬ 
est  through  fun  activities  seems  to  bear  potential  for  a  higher  contribution  rate. 
So-called  ‘gamification’  for  this  purpose  seems  to  show  positive  results;  how¬ 
ever,  this  is  very  much  dependent  on  the  project  type  and  the  volunteers  and 
requires  further  research  (Prestopnik  et  al.,  2014).  Immediate  and  continuous 
feedback  of  results  in  a  visually  attractive  and  easy  to  understand  manner  is 
also  important.  Social  media  can  also  be  a  good  way  to  keep  in  contact  with 
the  volunteers  (Gottschalk  Druschke  and  Seltzer,  2012).  Furthermore,  it  is 
very  helpful  to  engage  and  to  retain  citizens  by  clearly  addressing  the  positive 
aspects  of  their  participation,  for  example  the  benefits  they  can  gain,  such  as 
improved  health,  knowing  which  areas  are  polluted  and  how  to  avoid  exposure 
(in  the  case  of  air  quality)  or  personal  recognition  (e.g.  through  a  leader  board 
in  the  community).  Being  able  to  access  data  from  other  volunteers  and  to 
compare  them  to  the  data  collected  by  oneself,  as  well  as  dashboard  and  ana¬ 
lytical  tools  accessible  to  the  volunteers,  etc.,  are  all  useful  methods  to  engage 
citizens. 


4.3  Policies  and  Framework 

Even  though  participative  approaches  in  environmental  governance  have  been 
repeatedly  promoted  at  an  international  level,  this  does  not  mean  that  these 
approaches  are  automatically  followed  up  at  national,  regional  or  local  levels. 
Next  to  the  obvious  willingness  of  decision-makers,  their  level  of  readiness  is 
a  crucial  precondition  for  success.  In  this  context,  funding  opportunities  play 
an  important  role  (Conrad  and  Hilchey,  2011;  Litke  and  Day,  1998).  CS  and 
COs  represent  powerful  and  usually  low-cost  solutions  to  address  existing  gaps 
in  environmental  governance.  These  platforms  can  allow  authorities  to  obtain 
evidence  and  provide  citizens  with  opportunities  to  address  environmental  con¬ 
cerns.  However,  often,  citizens  participating  in  environmental  governance  are 
considered  a  ‘threat’  rather  than  a  resource  to  decision-makers,  since  they  are 
deemed  to  be  in  opposition  to  the  plans  of  the  authorities  or  industries.  Citizen 
participation  should  rather  be  considered  as  a  means  to  make  environmental 
governance  more  transparent  so  that  the  citizens’  trust  in  the  conclusions  of 
experts  will  increase.  Here,  the  challenge  lies  in  integrating  CS  in  environmental 
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decision-making  in  a  manner  that  enhances  the  process  by  enabling  it  to  deal 
with  issues  concerning  the  community  appropriately  and  that  at  the  same  time 
takes  into  consideration  the  risks  and  opportunities  that  go  along  with  these 
practices  (Hakley,  2015). 


4.4  Additional  Requirements  for  Citizens’  Observatories 

Additionally  to  the  challenges  that  have  been  mentioned  so  far,  the  establish¬ 
ment  of  COs  is  accompanied  by  a  number  of  further  development  needs.  COs 
usually  have  a  similar  structure;  however,  when  starting  a  new  CO,  the  whole 
infrastructure  and  data  flow  have  to  be  installed  from  scratch  (Liu  et  al.,  2014). 
So  far,  there  are  no  systematic,  easy  and  reusable  methods  to  do  so.  This  causes 
an  unsurmountable  hurdle  for  institutions  and  organisations,  as  they  usually 
lack  the  specific  technical  ICT  and  programming  knowledge  to  create  the 
required  server  infrastructure  and  mobile  applications.  As  a  result,  organisa¬ 
tions  can  fall  back  on  old-fashioned,  non-technological  methods  (which  can 
take  longer  to  implement)  or  spend  tremendous  amounts  of  their  often  lim¬ 
ited  budget  on  external  ICT  and  programming  experts  (D’Hondt  et  al,  2014; 
Zaman  et  al.,  2014). 

Liu  et  al.  (2014)  have  identified  the  following  development  needs  to  ensure  a 
functional  and  operational  CO  with  the  active  involvement  of  citizens: 

A.  The  adequate  promotion  of  a  CO  platform,  including  tools  and  activities 
for  capacity  building,  awareness  raising,  recruiting  and  maintaining  the 
participation  of  citizens; 

B.  A  good  understanding  of  the  current  and  future  societal  demography 
in  order  to  create  COs  that  meet  the  actual  and  future  needs  of  the 
population; 

C.  Building  a  long-lasting  infrastructure,  including  open  source  software 
with  the  following  requirements:  use  of  open  standards,  easy  exploita¬ 
tion  through  an  open  Application  Programming  Interface  (API),  and  the 
ability  to  be  widely  accessed,  extended  and  maintained.  A  CO  should  be 
seen  rather  as  a  generic  environmental  enabler  than  as  a  project-specific 
outcome; 

D.  Addressing  and  evaluating  both  citizens’  views  on  certain  environmental 
issues  and  their  related  actions  (‘Citizens’  Voice’)  and  the  accountability 
of  the  governments  for  their  environmental  actions  (‘Accountability’)  in 
the  social  and  political  context  of  each  CO  (Fernandez-Gimenez  et  al., 
2008).  These  two  concepts  should  be  actively  promoted  as  important 
dimensions  of  good  environmental  governance,  and  that  also  in  relation 
to  the  improvement  of  social  justice  (DFID,  2008;  Kamar  et  al.,  2012); 

E.  Developing  tailor-made  channels  and  mechanisms  to  enable  citizens  to 
actually  influence  environmental  governance  processes. 
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5  Conclusions 

Engaging  citizens  in  science  and  environmental  observations  is  a  challenging 
task.  While  many  scientists  are  cautious  about  using  data  from  volunteered 
observations,  others  believe  that  the  quality  of  such  data  is  sufficient  to  allow 
them  to  either  use  or  publish  the  data  while  admitting  that  further  work  may 
be  required  before  applying  such  data  in  other  ways.  However,  we  cannot  say 
much  about  the  quality  of  data  from  COs,  as  further  research  is  still  needed. 
The  need  for  further  research  also  applies  to  the  validation  processes,  data  inte¬ 
gration  and  quality  management.  Merging  citizen  data  with  authoritative  data 
and  integration  with  other  existing  data  may  also  be  considered.  Another  way 
to  improve  data  quality  is  to  pay  attention  to  the  composition  of  the  volunteer 
groups.  In  order  to  avoid  imbalances  and  biases  in  the  observations,  the  vol¬ 
unteers  should  be  representative  of  different  groups  (e.g.  different  age,  gender 
or  cultural  background  groups,  etc.).  Applying  co-design  approaches  in  the 
design  of  the  study/initiative  can  also  be  a  useful  way  to  maximise  outputs  of 
the  observation  process. 

In  order  for  citizens  to  participate  in  CS  and  CO  initiatives,  we  have  to 
create  activities  with  low  barriers  and  with  incentives  for  citizens  to  both 
start  participating  and  continue  to  do  so.  To  succeed,  we  (the  scientists)  have 
to  respect  every  volunteer  and  the  role  they  play,  manage  their  expectations 
and  be  transparent  in  our  plans  and  actions.  In  addition,  we  must  ensure 
to  protect  private  data  and  create  secure  solutions.  To  the  same  degree,  we 
have  to  respect  and  deal  with  the  expectations,  concerns  and  fears  of  pub¬ 
lic  authorities  in  the  same  open  and  transparent  manner.  It  is  important  to 
include  and  engage  public  authorities,  where  applicable,  from  the  start  to 
increase  the  chances  of  sustainable  outcomes  and  solutions,  and  to  influence 
their  policies. 

More  can  be  done  to  promote  citizen  participation  in  environmental  gov¬ 
ernance.  With  its  latest  Framework  Programme  for  Research  and  Innovation, 
Horizon  2020,  the  EC  is  strongly  promoting  citizen  engagement.  Aiming  to 
deepen  the  relationship  between  science  and  society  and  to  reinforce  public 
confidence  in  science,  Horizon  2020  should  foster  the  informed  engagement  of 
citizens  and  civil  society  in  research  and  innovation  by  promoting  science  edu¬ 
cation,  making  scientific  knowledge  more  accessible  and  developing  responsi¬ 
ble  research,  as  well  as  innovation  agendas  that  meet  the  actual  concerns  and 
expectations  of  citizens.  In  order  to  facilitate  the  participation  of  citizens  in 
Horizon  2020,  the  engagement  of  citizens  and  civil  society  should  be  coupled 
with  public  outreach  activities  to  generate  and  sustain  public  support  for  Hori¬ 
zon  2020  and  beyond.  Furthermore,  EU  research  in  this  area  often  consists  of 
top-down  prescribed  CO  and  CS  programmes,  which  would  need  to  be  com¬ 
patible  with  the  existing  bottom-up  networks  and  the  true  data  needs  of  citi¬ 
zens.  Together,  these  top-down  and  bottom-up  approaches  allow  us  to  mini- 
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mise  the  differences  and  maximise  the  similarities  among  multiple  systems, 
enabling  both  individual-case-study  data  analysis  and  integrated  data  analysis 
to  be  performed  (Liu  et  al.,  2014). 

The  growth  in  Web-based  CS  and  COs  and  the  use  of  mobile  phones  have 
opened  many  new  opportunities  for  instrumental  observations  that  can 
enhance  the  abilities  of  analysts  to  use  this  information  for  decision-making 
processes.  Overall,  policy-makers  and  government  officials  need  to  be  aware 
that  CS  and  COs,  in  the  latter’s  new  incarnation,  are  a  phenomenon  that  will 
continue  to  grow  and  impact  all  levels  of  government.  Each  CS  and  CO  activity 
will  always  involve  trade-offs  between  inclusion  of  people,  education,  aware¬ 
ness  of  science  and  contribution  to  scientific  research;  the  emerging  examples 
from  Europe  show  that,  with  appropriate  multidisciplinary  teams,  it  is  possible, 
however,  to  achieve  several  of  these  goals  in  any  given  activity. 

Another  opportunity  within  COs  is  the  potential  for  social  innovation,  novel 
partnerships  and  creating  new  opportunities  for  SMEs.  This  would  meet  the 
need  for  more  cross-cutting  and  transdisciplinary  activities  that  again  would 
result  in  the  creation  of  synergies  and  the  facilitation  of  interoperability  and 
coordination. 

Whereas  CS  initiatives  have  had  the  chance  to  learn  and  undergo  different 
changes  through  the  course  of  the  last  decades,  the  concept  of  CO  is  rather 
young.  Initiatives  following  this  approach  are  still  at  an  early  stage  and  an  hon¬ 
est  discussion  about  their  risks  and  opportunities  needs  to  be  carried  out  with 
citizens,  scientists,  authorities  and  other  potential  stakeholders  in  order  to 
determine  the  full  potential  and  areas  of  application  of  COs;  only  the  future 
will  show  if  our  efforts  were  worth  it. 
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Notes 


1  http://citizen-obs.eu/ 

2  http://gbbc.birdcount.org/ 

3  http://www.bigbutterflycount.org/ 

4  http://www.openstreetmap.org 

5  http://airqualityegg.com/ 

6  http://www.opalexplorenature.org/aboutOPAL  and  http://www.imperial. 
ac.uk/opal/ 

7  http://www.geo-wiki.org 

8  https://www.zooniverse.org/ 

9  http://vgibox.eu/ 

10  http://cartography.tuwien.ac.at/emomap/ 

11  http://www.everyaware.eu/ 

12  http://www.citizencyberscience.net/ 

13  http://www.opalexplorenature.org/aboutOPAL  and  http://www.imperial. 
ac.uk/opal/ 

14  https://ecsa.citizen-science.net/ 
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Abstract 

In  this  final  chapter,  we  speculate  on  future  developments  in  the  field  of  Volun¬ 
teered  Geographic  Information  (VGI);  we  focus  on  how  VGI  will  be  affected 
by  future  technological  developments,  but  we  also  consider  issues  such  as  VGI 
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quality,  the  relationship  of  VGI  with  science  and  citizens,  and  the  impact  of 
VGI  in  future  cities  and  societies. 


Keywords 
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and  ethical  concerns 


1  Introduction 

Katherine  is  a  typical  citizen  of  the  future.  The  year  is  2030.  Like  most  morn¬ 
ings,  Katherine  gets  up  and  goes  for  a  run,  wearing  sensors  embedded  in  her 
clothes.  These  sensors  monitor  her  vital  signs  and  communicate  with  her 
smartphone,  alerting  her  of  anything  unusual.  With  her  permission,  the  sensors 
also  send  the  data  to  many  different  places,  including  to  her  medical  records, 
her  health  insurance  company  and  a  vast  supercomputing  facility,  which  uses 
her  Volunteered  Geographic  Information  (VGI),  along  with  that  of  millions 
of  other  citizens,  to  uncover  behavioural  and  health  patterns  that  can  be  used 
to  provide  doctors  with  preventative  health  care  advice.  Before  going  to  work, 
Katherine  controls  the  environment  of  her  house  using  her  smartphone;  this 
VGI  gets  sent  to  her  gas  and  electricity  companies,  who  use  the  data  to  bill  her, 
but  also  to  determine  customer  behaviour  so  that  they  can  optimise  their  tariffs 
and  provide  customers  like  Katherine  with  advice  on  how  to  save  money  while 
being  environmentally  friendly.  Katherine’s  driverless  electric  car  takes  her  to 
work,  where  she  is  a  spatial  data  quality  expert  at  the  National  Mapping  Agency 
(NMA)  in  her  country.  She  is  responsible  for  the  quality  assurance  and  quality 
control  of  the  NMA’s  spatial  databases.  Today  she  is  focused  on  doing  some  rou¬ 
tine  quality  assurance  on  the  main  topographic  database,  which  is  a  dynami¬ 
cally  updated  set  of  layers  that  takes  in  changes  from  a  range  of  users,  including 
citizens.  She  does  some  checks  to  ensure  that  the  automated  quality  assurance 
procedures  are  filtering  out  data  that  do  not  meet  the  minimum  requirements 
for  the  database  and  determines  where  to  send  field  surveyors  to  confirm  any 
critical  changes.  Today  is  Friday  and  Katherine  is  looking  forward  to  attending 
a  weekend  mapping  party,  which  will  focus  on  helping  another  country  build 
up  their  own,  quality  assured  topographic  database  with  seamless  input  from 
experts  like  her,  interested  citizens,  businesses  and  non-governmental  organisa¬ 
tions  on  the  ground. 

This  vision  of  a  future  world  in  which  Katherine  lives  is  not  that  far  away  and 
many  of  these  things  are  already  happening,  even  if  only  on  a  small  scale  at 
present.  Although  providing  longer  term  predictions  about  VGI  is  a  challenge 
because  VGI  is  heavily  reliant  on  rapidly  changing  technologies,  it  is  clear  that 
the  role  of  citizen  sensors  is  likely  to  become  much  more  prominent  than  it  is 
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today.  It  is  anticipated  that  citizen-derived  data  will  grow  considerably  and  be 
used  in  increasingly  diverse  ways  in  the  near  future.  The  amount  of  spatial  data 
available  is  increasing  exponentially  (Craglia  and  Shanley,  2015),  and  the  diver¬ 
sity  of  data  sources  and  types  is  also  increasing,  e.g.  through  current  trends 
such  as  Digital  Earth  (Craglia  et  al.,  2012),  Smart  Cities  (Batty  et  al.,  2012), 
Citizen  Science  (Bonney  et  al.,  2009;  Silvertown,  2009),  the  Internet  of  Things 
(IoT;  Ashton,  2009)  and  Data  Analytics  (Kitchin,  2013).  Thus,  this  chapter  will 
attempt  to  examine  the  relationship  between  VGI  and  a  number  of  these  cur¬ 
rent  technological  trends.  We  also  consider  VGI  quality,  which  will  continue  to 
be  one  of  the  most  important  obstacles  for  the  future  diffusion  of  VGI,  as  well 
as  legal  and  ethical  concerns. 


2  Technology 

VGI  has  been  heavily  based  on  advances  in  the  information  and  communi¬ 
cation  technology  (ICT)  domain.  Web  2.0  applications  (O’Reilly,  2007),  GPS- 
enabled  devices  and  the  open  availability  of  very  fine  spatial  resolution  satellite 
sensor  imagery,  sensor-equipped  portable  devices  and  smartphones  have  all 
been  growth  drivers  for  crowdsourced  spatial  data.  Thus,  it  is  expected  that 
future  advances  in  these  areas  will  continue  to  play  a  major  role  in  the  future 
of  VGI. 

As  an  initial  technological  consideration,  it  can  be  noted  that  the  basic  infra¬ 
structure,  such  as  Internet  availability,  bandwidth  and  processing  power,  has  an 
important  role  to  play;  such  infrastructure  examples  are  all  expected  to  evolve 
considerably  and  thus  to  greatly  affect  both  the  number  of  people  online  and 
the  quality  of  connectivity  and  communication.  Based  on  what  we  have  experi¬ 
enced  during  the  last  few  decades,  it  is  safe  to  say  that  the  way  in  which  people 
are  connected  online  will  move  to  a  totally  new  level. 

The  continuing  developments  in  location-aware,  data  capturing  devices  are 
likely  to  impact  greatly  on  the  future  of  VGI.  The  removal  of  the  selected 
availability  of  the  GPS  signal  (Clinton,  2000)  has  led  to  the  proliferation 
of  GPS-enabled  sensors  in  even  low-cost  everyday  devices.  Thus,  location- 
enabled  devices  are  now  everywhere,  from  smartphones  and  cameras  in  our 
pockets  to  cars,  airplanes  and  ships  around  the  world.  However,  there  is  a 
clear  distinction  to  be  made:  on  the  one  hand,  there  are  human- controlled 
devices  that  collect  data  in  relation  to  an  individual’s  activity,  while,  on  the 
other  hand,  there  are  sensors  that  constantly  collect  and  transmit  location- 
aware  data  about  a  phenomenon.  Regarding  the  former,  our  generation  has 
witnessed  the  appearance  of  mobile  phones,  which  then  evolved  into  smart¬ 
phones  and  have  now  been  transformed  into  location-capturing  devices; 
when  combined  with  web  applications  and  social  networking,  the  volumes 
of  data  created  are  immense.  There  are  many  examples  of  Web-based  applica¬ 
tions,  such  as  Facebook,  Flickr,  Foursquare,  etc.,  where  the  data  come  from 


380  Mapping  and  the  Citizen  Sensor 


the  conscious  use  of  these  applications  but  the  geographic  information  (GI) 
is  generated  implicitly  by  the  users  without  the  original  aim  of  actually  creat¬ 
ing  geospatial  datasets.  This  can  be  distinguished  from  the  proliferation  of  all 
kinds  of  sensors  that  passively  collect  spatial  data,  mostly  in  an  urban  context. 
From  high-end  sensors  to  do-it-yourself,  low-cost  devices  based  on  hardware 
platforms  such  as  Raspberry  Pi  and  Arduino,  the  flow  of  sensor-recorded 
location  data  is  expected  to  increase.  All  these  connected  sensors  are  part  of 
the  vision  of  the  IoT.  Widespread  sensor  networks  may  dominate  the  urban 
fabric  initially,  but  then  expansion  to  a  global-wide  sensor  network  would  be 
a  natural  continuation  of  this  trend  in  sensor  technology. 

While  the  human-controlled  and  sensor  network  data  sources  of  GI  have, 
up  until  now,  been  working  in  a  complementary  way,  this  situation  could  also 
change  in  the  future.  A  key  question  is  whether  developments  in  ubiquitous 
sensing  will  lead  to  a  decline  in  human-collected  VGI.  For  example,  to  know 
how  people  are  moving  inside  a  city,  will  it  be  necessary  to  tap  into  data  from 
wearable  technology  if  we  can  use  sensors  to  automatically  count  the  number 
of  people  crossing  every  street  in  every  city?  Will  we  need  people  to  measure 
air  quality  (Goodchild,  2007)  or  make  noise-maps  (Foerster  et  al.,  2010)  if  we 
have  low-cost  air  and  noise  sensors  located  on  every  street  corner?  Moreover, 
sensor-collected  data  will  not  suffer  from  some  of  the  quality  issues  or  biases 
that  usually  accompany  human-collected  VGI.  Some  technologies  may,  how¬ 
ever,  rely  on  VGI  to  function  properly  or  to  realise  their  full  potential.  Take, 
for  example,  smart  thermostats,  which  are  intended  to  learn  over  time  and 
make  adjustments  that  improve  the  efficiency  of  heating/cooling  systems 
while  maximising  the  comfort  of  users.  Such  connected  devices  or  sensors  of 
the  IoT  require  some  active  human  intervention  and  thus  will  always  involve 
some  form  of  VGI.  Many  more  electronic  devices  of  this  nature  are  expected  to 
emerge  in  the  near  future. 

Technological  trends  also  cover  advances  in  software  and  algorithms.  It  is 
likely  that  the  technology  for  handling  large  and  complex  datasets  will  advance 
in  ways  that  will  more  fully  exploit  the  use  of  VGI.  Data  quality  is  a  major  issue 
related  to  VGI  at  present,  so  it  is  likely  that  in  the  future  we  will  develop  new, 
sophisticated  algorithms  to  address  biases  and  quality  issues  that  arise  from 
the  spatial  distribution  of  participation  (see  e.g.  Haklay,  2010;  Antoniou,  2011; 
Barron  et  al.,  2014).  This  will  reveal  the  areas  and  feature  types  that  suffer  more 
in  terms  of  quality  and  thus  need  more  directed  attention  from  volunteers. 
Just  imagine  a  map  with  the  following  stated  differences  in  scale,  and  hence 
in  positional  accuracy,  due  to  heterogeneous  citizen  contributions:  ‘in  urban 
areas  roads  are  of  scale  1:5,000,  buildings  are  of  scale  1:25,000  and  land  cover  is 
of  scale  1:50,000,  but  in  rural  areas  land  cover  is  of  scale  1:10,000,  roads  are  of 
scale  1:25,000  and  buildings  are  of  scales  1:10,000;  urban  areas  are  more  com¬ 
plete  than  rural  ones’.  One  could  imagine  similar  caveats  regarding  thematic 
accuracy.  It  is,  therefore,  anticipated  that  VGI  projects,  based  on  this  algorith¬ 
mic  evaluation  of  quality,  will  want  to  guide  their  contributors  to  specific  areas 
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or  spatial  feature  types  in  order  to  counterbalance  any  recorded  biases  (see  for 
example  how  Geograph1  informs  its  contributors).  However,  it  is  uncertain 
what  this  ‘algorithmic  management’  (Lee  et  al.,  2015)  will  do  to  VGI.  On  one 
hand,  it  may  greatly  enhance  the  quality  and  thus  the  acceptance  by  a  broader 
audience  of  VGI.  On  the  other  hand,  if  this  results  in  removing  features  such  as 
the  freedom  of  expression,  fun  and  intuitiveness  from  the  contribution  process, 
this  may  severely  curtail  VGI  as  a  phenomenon  in  the  future. 

In  summary,  technology  will  continue  to  evolve,  and  VGI  will  certainly  con¬ 
tinue  to  leverage  technological  advances.  Strong  indications  of  what  the  near 
future  will  bring  are  already  visible.  Indoor  positioning  and  mapping  devices 
(see  for  example  Google’s  Tango  project2)  will  bring  VGI  into  built-up  areas. 
Drones  are  becoming  increasingly  popular  and  we  are  still  exploring  their 
potential  as  a  source  of  data  for  many  different  fields,  from  humanitarian  appli¬ 
cations  to  land  cover  and  elevation  mapping.  Finally,  wearable  technology, 
which  is  still  at  an  early  stage,  is  expected  to  become  ubiquitous  and  will  vastly 
multiply  the  amount  of  spatial  data  on  the  Web.  These  are  just  a  few  examples 
of  what  the  future  holds,  and  they  have  the  potential  to  vastly  influence  and 
shape  the  field  of  VGI. 


3  VGI,  Smart  Cities  and  Digital  Earth 

Both  the  growth  of  VGI  and  the  evolution  of  technology  have  pushed  forward 
the  initiatives  of  Smart  Cities  and  Digital  Earth.  The  transformation  of  our  liv¬ 
ing  environment  into  a  smart,  interconnected  place  will  lead  to  a  more  detailed 
recording,  and  hence  a  better  understanding,  of  the  spatial-temporal  pattern 
of  human  activity.  As  Roche  (2014)  points  out,  the  future  of  smart  cities  will 
probably  be  spatially  enabled  and  develop  new  spatial  skills.  Thus,  if  we  better 
understand  the  structure  of  future  cities  and  of  the  human  activities  taking 
place  within  them,  we  will  also  be  better  placed  to  understand  the  role  of  VGI 
within  them. 

Spatially  enabling  our  cities  is  easier  said  than  done  but  will  very  soon  prove 
to  be  a  priority.  According  to  the  United  Nations  Environment  Program  (n.d.), 
while  cities  will  cover  only  3%  of  the  Earth’s  inhabited  land  area  by  2050,  almost 
80%  of  the  population  on  the  globe  will  live  in  cities,  which  will  account  for 
75%  of  the  total  energy  consumed  and  60-80%  of  Greenhouse  Gas  (GHG) 
emissions.  It  is  easy  for  anyone  to  understand  that  sustainability  is  one  of  the 
most  important,  yet  elusive,  societal  concerns.  However,  if  we  do  not  want  to 
lower  our  living  standards,  then  improvements  in  urban  functions  will  become 
a  necessity.  To  this  end,  geospatial  data  and  particularly  VGI  can  be  a  valu¬ 
able  input.  Urban  planners,  authorities,  local  administrations,  NGOs  and  active 
communities  can  benefit  from  detailed,  up-to-date,  timely  and  freely  available 
GI.  A  list  of  examples  of  how  VGI  is  used  by  governments  and  authorities  is 
provided  in  Haklay  et  al.  (2014),  where  the  added  value  of  using  VGI  alone 
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or  in  combination  with  authoritative  data  to  improve  resource  allocation,  effi¬ 
ciency  and  transparency  is  presented. 

While  technology  will  continue  to  play  an  important  role  in  Smart  Cities, 
human  capital  is  equally  fundamental  to  city  intelligence.  Spatially  literate  citi¬ 
zens  are  needed  both  to  embrace  new  developments  and  to  push  for  innovative 
solutions.  To  this  end,  VGI  has  much  to  offer  now,  and  even  more  so  in  the 
future.  Ubiquitous  crowdsourced  spatial  information  can  serve  as  the  base- 
layer  on  top  of  which  all  future  ‘smart’  functionalities  of  a  city  could  develop. 


4  VGI  Quality 

Although  VGI  has  been  a  growing  phenomenon  for  over  a  decade  now 
(Capineri  et  al.,  2016;  See  et  al.,  2016),  one  of  the  major  factors  that  hinders  the 
more  widespread  diffusion  and  uptake  of  VGI  is  the  lack  of  a  robust  and  stand¬ 
ardised  way  to  evaluate  data  quality,  as  outlined  in  Chapter  7  by  Fonte  et  al. 
(2017).  VGI  could  both  facilitate  and  accelerate  the  transition  to  Smart  Cities 
and  Digital  Earth  if  it  were  credible  enough  to  trust  and  hence  use  in  applica¬ 
tions  that  require  accurate  GI.  However,  this  quest  for  trust,  fitness-for-purpose 
and  usability  of  VGI  data  comes  down  to  implementing  or  devising  tangible 
ways  of  measuring  and  reporting  VGI  quality.  Without  concrete  knowledge 
of  the  state  of  a  VGI  dataset,  its  use  might  end  up  being  a  leap  of  faith  that 
no  serious  stakeholder  is  willing  to  take.  Yet  if  the  quality  requirements  for 
VGI  are  too  stringent  in  terms  of  data  specifications,  precision,  update  cycles, 
spatial  coverage  or  metadata,  then  we  may  end  up  discouraging  volunteers.  At 
the  same  time,  we  need  to  avoid  the  situation  whereby  VGI  is  considered  to  be 
‘laypeople’s  data  of  de-facto  inferior  quality,  full  of  biases,  with  no  metadata 
and  only  occasional  respect  for  protocols  and  best  practices;  such  a  develop¬ 
ment  would  disrupt  the  momentum  and  the  dynamic  that  VGI  has  developed 
so  far  and  will  mark  this  kind  of  data  out  as  marginal  or  as  a  cheap  and  untrust¬ 
worthy  replacement  for  authoritative  datasets.  It  is  important  to  note  that  VGI 
is  already  sometimes  as  good  as,  if  not  superior  to,  authoritative  data  and  can 
even  exceed  the  quality  requirements  of  NMAs  for  common  mapping  applica¬ 
tions  (Olteanu-Raimond  et  al.,  2017). 

For  these  reasons,  the  evaluation  of  VGI  data  quality  has  been  a  hot  topic 
in  academia  (see  e.g.  Haklay  et  al.,  2010;  Begin  et  al.,  2013;  Antoniou  and 
Skopeliti,  2015;  Foody  et  al.,  2015;  Senaratne  et  al.,  2016;  Fonte  et  al.,  2017), 
and  research  on  this  topic  will  continue  in  the  future,  not  least  because  improv¬ 
ing  the  methods  for  reporting  quality  could  end  up  becoming  a  catalyst  for 
the  widespread  diffusion  of  VGI  in  mainstream  geomatics  engineering.  Well 
established  methods  for  spatial  data  quality  evaluation  (e.g.  ISO  specifications), 
while  still  valid,  need  to  be  supplemented  with  additional  evaluation  tools  that 
take  the  specific  nature  of  VGI  into  account  (Antoniou  and  Skopeliti,  2015; 
Fonte  et  al.,  2017).  If  adequate  quality  assurance  tools  and  algorithms  fail  to 
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materialise,  then  the  future  uses  of  VGI  might  not  expand  much  beyond  what 
we  see  today  That  said,  VGI  is  highly  interdisciplinary,  combining  underlying 
social,  economic  and  technological  factors  within  the  geospatial  domain;  the 
result  is  the  recording  of  space  and  phenomena  based  on  what  citizens  perceive 
to  be  important.  Thus,  uncertainty,  biases  and  noise  in  the  data  might  never  be 
fully  eliminated.  Instead,  we  need  to  understand,  model  and  handle  these  issues 
so  that  VGI  can  be  used  effectively. 

Future  efforts  might  focus  on  data  harmonisation,  which  can  play  an  impor¬ 
tant  role  in  the  era  of  big  data  since  it  may  enable  data  comparison,  allowing 
the  application  of  the  law  of  large  numbers,  i.e.  the  tendency  to  arrive  at  the 
expected  value  by  averaging  the  results  obtained  from  repeating  an  experiment 
a  large  number  of  times  (Kuhn,  2007),  and  contribute  to  an  automated  and  fast 
preliminary  data  quality  assessment  and  even  data  conflation.  To  address  the 
availability  of  multiple  sources  that  may  potentially  be  useful,  methodologies 
need  to  be  developed  to  assist  users  in  choosing  the  right  dataset  or  the  right 
combination  of  datasets  for  each  application.  Decisions  such  as  these  will  be 
aided  by  the  provision  of  information  about  the  data,  and  hence  metadata  are 
likely  to  become  increasingly  important  accompaniments  of  citizen-derived 
datasets.  Given  the  huge  amount  of  VGI  foreseen  in  the  future,  it  is  likely  that 
there  will  be  a  focus  on  the  development  of  approaches  that  are  more  auto¬ 
mated  for  the  assessment  of  VGI  quality;  this  development  will  be  challenging 
given  the  greatly  varied  nature  of  the  data,  which  can  be  unstructured  and  het¬ 
erogeneous,  but  is  nevertheless  of  high  potential  value. 

5  VGI  in  Science 

Despite  VGI  quality  being  an  obstacle  to  the  larger  diffusion  of  crowdsourced 
data  in  everyday  applications,  there  has  been  considerable  use  of  VGI  in  scien¬ 
tific  research,  in  particular  in  citizen  science  projects.  Citizen  science  typically 
refers  to  the  involvement  of  citizens  in  scientific  research,  either  in  collabora¬ 
tion  with  or  under  the  direction  of  professional  scientists  (Silvertown,  2009). 
A  considerable  number  of  such  projects  actively  use  geospatial  or  geotagged 
data.  Citizens  usually  use  smartphones,  cheap  do-it-yourself  devices  or  more 
advanced  purpose-built  sensors  to  observe  or  measure  a  phenomenon  associ¬ 
ated  with  geographic  information  on  a  volunteered  basis. 

Large-scale  scientific  projects  that  need  a  regional  or  even  global-wide  spa¬ 
tial  coverage  are  now  feasible  via  the  power  of  the  crowd.  In  fact,  any  project 
of  such  scale  needs  to  seek  assistance  from  the  crowd  in  order  to  collect  the 
volumes  of  data  needed  for  research.  Examples  include  the  Christmas  Bird 
Count3,  Asteroid  Zoo4  or  iNaturalist5.  Apart  from  simple  data  collection,  peo¬ 
ple  participating  in  citizen  science  projects  might  get  more  involved  in  the 
analysis  of  the  data  or  in  the  interpretation  of  the  results;  for  an  analysis  on  the 
typology  of  participation  see  Haklay  (2013).  This  increasing  trend  in  citizen 
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participation  in  citizen  science  projects  will  most  likely  continue  in  the  future, 
particularly  given  the  success  of  many  different  citizen  science  projects  and  the 
active  interest  shown  by  authorities  such  as  the  European  Union  in  building 
citizen  observatories.  This  trend  is  also  an  important  development  for  VGI 
on  many  levels.  First,  as  more  and  more  citizens  get  actively  involved  in  sci¬ 
entific  projects  at  a  local  or  global  scale,  collaboration  and  volunteerism  will 
become  stronger.  Also,  involvement  in  science  has  much  to  teach  enthusiastic 
but  untrained  contributors  of  VGI.  If  we  start  considering  VGI  observations 
and  measurements  as  scientific  ones,  then  following  rigorous  data  protocols 
for  production  and  evaluation,  explicitly  documenting  measurements  with 
metadata,  and  the  ability  to  replicate  results  may  become  more  important  for 
VGI  projects;  in  some  cases  it  may  even  become  obligatory,  as  with  many  cur¬ 
rent  citizen  science  projects. 

6  VGI,  Citizens  and  Societies 

Throughout  the  book,  it  has  been  repeatedly  shown  that  the  driving  force  of 
VGI  is  volunteers  and  their  modes  of  engagement.  Although  technological 
advancements  provide  the  means  for  novel  ways  of  ubiquitous  data  capturing, 
what  transforms  the  technological  means  into  a  global-wide  phenomenon  that 
challenges  the  fundamentals  of  the  geospatial  domain  is  the  role  of  citizens 
and  their  engagement  with  volunteered  contributions  of  location-based  data. 
Consequently,  the  future  of  VGI  is  closely  related  to  the  future  of  social  trends 
and  social  evolution. 

Crowdsourcing,  volunteerism,  active  communities,  citizen  science  and  social 
enterprises  are  early  formations  that  can  take  the  lead  in  the  sustainable  pro¬ 
duction  of  VGI.  If  such  social  initiatives  evolve  further,  gain  momentum  and 
become  commonplace,  then  the  bottom-up  production  of  geotagged  data  will 
rise  to  entirely  new  levels.  For  example,  it  is  worth  noting  how  online  commu¬ 
nities  in  citizen  science  projects  address  real-world  problems.  Similar  examples 
exist  in  the  VGI  sphere,  and  can  be  found  in  the  efforts  of  the  Humanitarian 
OpenS treetMap  Team  (HOT),  which  mobilises  volunteers  in  mapping  areas 
that  have  been  hit  by  natural  disasters.  Interestingly,  such  grassroots  collabora¬ 
tion  overcomes  societal  barriers  and  enables  citizens  to  participate  in  the  man¬ 
agement  and  improvement  of  quality  of  life,  a  common  goal  of  visions  such  as 
Digital  Earth  and  Smart  Cities. 

A  really  intriguing,  and  equally  interesting,  future  development  might  arise 
if  we  consider  location  and  spatial  information  as  common  goods  (Roche  et  al., 
2012)  that  are  mainly  produced  and  maintained  by  people.  What  changes 
will  this  generate  in  our  society?  What  will  be  the  benefits  to  and  responsi¬ 
bilities  of  the  citizens  and  the  authorities?  For  instance,  we  will  need  to  steer 
future  societies  into  geospatial  crowdsourcing,  understand  its  value,  its  ben¬ 
efits,  its  potential  and  the  steps  that  we  need  to  take  in  order  to  create  and 
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sustain  spatial  infrastructures.  Consequently,  citizens  should  be  initiated  and 
trained  into  the  world  of  geospatial  information  from  the  early  years  of  their 
education.  Geography  curricula  and  lessons  should  be  redesigned  to  include 
the  collection  of  geotagged  information  in  a  volunteered  and  collaborative 
mode.  There  are  already  excellent  examples  available  to  provide  initial  best 
practice.  These  include  the  activities  of  the  Finnish  Environment  Institute  and 
the  Finnish  National  Land  Survey  Agency,  which  have  introduced  citizen  sci¬ 
ence  and  crowdsourced  data  collection  in  elementary  schools,  the  Museum 
National  d’Histoire  Naturelle  in  France,  which  introduced  collaborative  science 
on  biodiversity  into  French  schools,  or  the  positive  experiences  of  the  Dutch 
Kadaster,  which  introduced  a  new  curriculum  on  crowdsourcing  and  mapping 
in  elementary  schools. 

It  should  be  noted,  however,  that  future  developments  in  citizen  sensing  may 
require  greater  consideration  of  the  citizen  as  well  as  the  end  use  of  the  data 
generated.  A  greater  understanding  of  citizen  sensors  is  required  as  there  is  a 
two-way  dialogue  between  those  using  and  contributing  the  VGI,  especially  as 
citizens  may  also  be  the  source  of  very  useful  ideas.  Feedback  to  citizen  con¬ 
tributors  is  likely  to  become  much  more  important,  especially  in  developing  the 
citizens’  skills  and  maintaining  motivation.  A  new  reality  in  which  the  role  of 
geospatial  information  is  highlighted,  which  renders  its  collection  and  mainte¬ 
nance  a  common  responsibility,  might  prove  a  very  efficient  way  to  secure  the 
motivation  and  long-term  engagement  from  large  parts  of  the  population  that 
is  needed  to  support  global-wide  geospatial  data  collection. 

7  Understanding  the  True  Value  of  VGI 

Much  of  the  literature  on  VGI  is  about  understanding  this  phenomenon.  The 
subjects  examined  range  from  the  motivation  behind  volunteered  contribu¬ 
tions,  the  quality  of  the  data  obtained  or  the  biases  that  VGI  datasets  might  pos¬ 
sess  to  the  integration  of  VGI  with  other  sources  of  data.  Little  has  been  written 
about  the  true  value  of  VGI.  By  ‘true  value’,  we  refer  to  what  VGI  has  offered  not 
only  to  the  geomatics  domain  but  also  to  people  and  society  as  a  whole. 

The  bottom-up  production  of  VGI  has  democratised  the  production  and  use 
of  GI.  VGI  has  changed  a  landscape  where  spatial  data  creation  was  once  the 
responsibility  and  privilege  of  a  few  governmental  agencies  or  large  corpora¬ 
tions  (e.g.  NMAs),  and  where  the  access  to  spatial  information  was  limited  and 
usually  very  expensive  for  the  public.  What  VGI  did,  and  probably  will  con¬ 
tinue  to  do  in  the  future,  was  to  create  a  closer  relationship  between  the  pub¬ 
lic,  on  the  one  hand,  and  geography,  cartography,  web  mapping  and  geospatial 
applications,  on  the  other  hand;  in  a  sense,  the  public  have  been  introduced  to 
the  value  of  GI.  The  omnipresence  of  GI  in  everyday  devices  and  the  multiple 
applications  and  services  offered  today  that  are  based  on  spatial  data  would  not 
have  been  possible  without  this  new,  enlightened  relationship.  Moreover,  there 
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is  a  constantly  increasing  demand  for  more  GI,  both  in  terms  of  quantity  and 
detail.  As  VGI  has,  in  a  sense,  spatially  enabled  our  societies,  the  need  for  more 
data  of  this  nature  will  only  intensify  in  the  future.  Now,  for  the  first  time,  it  is 
possible  to  have  a  tangible  picture  of  how  people  understand  space,  what  mat¬ 
ters  to  them  and  what  they  think  needs  to  be  on  a  map.  The  horizon  of  what 
GI  should  cover  has  been  considerably  broadened,  ranging  from  the  mapping 
of  litter6,  noise  pollution  (Maisonneuve  et  al.,  2010)  and  other  relevant  urban 
problems7  to  the  support  of  Smart  Cities  and  a  wealth  of  other  applications. 
This  information  is  valuable  for  understanding  how  societies  function  and 
what  we  need  to  do  in  the  future  to  help  improve  them. 


8  Future  Legal  and  Ethical  Concerns 

The  importance  of  legal  and  ethical  issues  has  already  been  raised  in  Chapter 
6  by  Mooney  et  al.  (2017),  but  much  more  attention  will  need  to  be  given  to 
these  issues  in  the  future.  It  is  anticipated  that  VGI  will  increasingly  be  har¬ 
vested  from  diverse  sources  including  social  media  and  wearable  devices. 
While  potentially  yielding  vast  amounts  of  useful  VGI,  including  information 
about  human  location,  movement  and  behaviour,  this  comes  with  a  suite  of 
data  privacy,  ethical  and  legal  concerns.  These  are  complex  issues,  since  legisla¬ 
tion  tends  to  lag  behind  advances  in  technology  and  also  differs  from  country 
to  country.  There  are  also  serious  concerns  with  the  reuse  of  VGI;  in  many 
instances,  especially  when  it  is  mined  from  open  resources,  VGI  may  be  used 
for  different  applications  than  the  original  purpose  of  data  collection,  which 
some  volunteers  may  be  uncomfortable  with.  As  the  ability  to  integrate  and 
fuse  together  greater  numbers  of  complex  and  disparate  datasets  increases,  it  is 
of  crucial  importance  that  the  issue  of  data  reuse  be  addressed.  Data  reuse  also 
links  to  legal  concerns;  for  example,  if  the  VGI  was  acquired  by  digitising  from 
a  map  or  image  without  the  relevant  permissions,  what  are  the  implications 
for  those  that  reuse  the  VGI?  Equally  important  are  possible  cases  of  vandal¬ 
ism.  Intentional  deterioration  of  the  quality  of  a  VGI  dataset  or  the  insertion 
of  false  data  could  have  considerable  ramifications  if  the  data  are  then  used  in 
decision-making  or  policy  implementation.  It  is  anticipated  that  in  the  future, 
as  VGI  gains  momentum,  there  will  be  a  need  to  better  safeguard  the  integrity 
and  objectivity  of  this  data  source. 


9  The  Final  Word 

This  is  a  time  of  very  rapid  change  -  in  the  last  decade  the  geomatics  domain  has 
witnessed  unprecedented  growth.  GI  has  moved  from  the  control  of  a  few  pro¬ 
ducers  to  the  hands  of  many,  who  now  have  the  power  to  produce  and  update 
many  different  spatial  data  repositories.  At  the  same  time,  demand  for  timely, 
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free  and  accurate  GI  is  multiplying.  Whether  from  the  move  to  a  digitised  envi¬ 
ronment  or  from  the  frequent  use  of  map-based  applications,  the  value  of  GI 
has  been  widely  recognised  by  many  VGI  has  been  a  catalyst  for  these  changes, 
but  we  are  currently  standing  at  a  very  important  crossroads:  either  VGI  will 
move  to  a  new  level  in  which  it  will  be  the  key  enabling  factor  for  future  devel¬ 
opments  or  it  will  remain  at  current  levels  of  acceptance,  running  the  danger  of 
being  overtaken  by  developments  in  other  domains,  and  possibly  even  decline 
or  decay  The  responsibility  for  what  happens  is,  at  least  partially,  in  the  hands  of 
GI  professionals  as  well  as  citizens.  Fortunately,  networks  such  as  COST  Action 
TD  1202s,  out  of  which  this  book  has  arisen,  are  succeeding  in  bringing  together 
an  interdisciplinary  community  including  professionals  from  NMAs.  By  work¬ 
ing  together  to  address  VGI  quality  issues  and  potential  dangers  to  the  field  of 
VGI,  we  will  strive  to  ensure  that  VGI  has  a  strong  and  exciting  future. 


Notes 


1  http://www.geograph.org.uk/ 

2  https://get.google.com/tango/ 

3  http://www.audubon.org/conservation/science/christmas-bird-count 

4  https://www.asteroidzoo.org/ 

5  http://www.inaturalist.org/ 

6  http://www.litterati.org/ 

7  https://www.fixmystreet.com/ 

8  http  ://www.  citizensensor-  cost.eu/ 
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Maps  are  a  fundamental  resource  in  a  diverse  array  of  applications 
ranging  from  everyday  activities,  such  as  route  planning  through 
the  legal  demarcation  of  space  to  scientific  studies,  such  as  those 
seeking  to  understand  biodiversity  and  inform  the  design  of  nature 
reserves  for  species  conservation.  For  a  map  to  have  value,  it  should 
provide  an  accurate  and  timely  representation  of  the  phenomenon 
depicted  and  this  can  be  a  challenge  in  a  dynamic  world.  Fortunately, 
mapping  activities  have  benefited  greatly  from  recent  advances  in 
geoinformation  technologies.  Satellite  remote  sensing,  tor  example, 
now  offers  unparalleled  data  acquisition  and  authoritative  mapping 
agencies  have  developed  systems  for  the  routine  production  of  maps 
in  accordance  with  strict  standards.  Until  recently,  much  mapping 
activity  was  in  the  exclusive  realm  of  authoritative  agencies  but 
technological  development  has  also  allowed  the  rise  of  the  amateur 
mapping  community.  The  proliferation  of  inexpensive  and  highly 
mobile  and  location  aware  devices  together  with  Web  2.0  technology 
have  fostered  the  emergence  of  the  citizen  as  a  source  of  data. 
Mapping  presently  benefits  from  vast  amounts  of  spatial  data  as  well 
as  people  able  to  provide  observations  of  geographic  phenomena, 
which  can  inform  map  production,  revision  and  evaluation.  The 
great  potential  of  these  developments  is,  however,  often  limited 
by  concerns.  The  latter  span  issues  from  the  nature  of  the  citizens 
through  the  way  data  are  collected  and  shared  to  the  quality  and 
trustworthiness  of  the  data.  This  book  reports  on  some  of  the  key  issues 
connected  with  the  use  of  citizen  sensors  in  mapping.  It  arises  from  a 
European  Co-operation  in  Science  and  Technology  (COST)  Action, 
which  explored  issues  linked  to  topics  ranging  from  citizen  motivation, 
data  acquisition,  data  quality  and  the  use  of  citizen  derived  data 
in  the  production  of  maps  that  rival,  and  sometimes  surpass,  maps 
arising  from  authoritative  agencies. 
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